{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 已给西雅图酒店数据集，酒店数据集中包括对酒店信息，包括酒店名称 地址 描述。然后根据用户搜索某个酒店，推荐最相似的10个酒店"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "Step1， 对酒店描述（ Desc） 迕行特征提取\n",
    "    N-Gram， 提取N个还续字的集合， 作为特征\n",
    "    TF-IDF， 按照(min_df, max_df)提取关键词， 并生成TFIDF矩阵\n",
    "Step2， 计算酒店间的相似度矩阵\n",
    "    余弦相似度\n",
    "Step3， 对于指定的酒店， 选择相似度最大的Top-K个酒店迕行输出"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from nltk.corpus import stopwords\n",
    "from sklearn.metrics.pairwise import linear_kernel\n",
    "from sklearn.feature_extraction.text import CountVectorizer\n",
    "from sklearn.feature_extraction.text import TfidfVectorizer\n",
    "import re\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step1， 对酒店描述（ Desc） 迕行特征提取\n",
    "    N-Gram， 提取N个还续字的集合， 作为特征\n",
    "    TF-IDF， 按照(min_df, max_df)提取关键词， 并生成TFIDF矩阵"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 读取数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>address</th>\n",
       "      <th>desc</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Hilton Garden Seattle Downtown</td>\n",
       "      <td>1821 Boren Avenue, Seattle Washington 98101 USA</td>\n",
       "      <td>Located on the southern tip of Lake Union, the...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Sheraton Grand Seattle</td>\n",
       "      <td>1400 6th Avenue, Seattle, Washington 98101 USA</td>\n",
       "      <td>Located in the city's vibrant core, the Sherat...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Crowne Plaza Seattle Downtown</td>\n",
       "      <td>1113 6th Ave, Seattle, WA 98101</td>\n",
       "      <td>Located in the heart of downtown Seattle, the ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Kimpton Hotel Monaco Seattle</td>\n",
       "      <td>1101 4th Ave, Seattle, WA98101</td>\n",
       "      <td>What?s near our hotel downtown Seattle locatio...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>The Westin Seattle</td>\n",
       "      <td>1900 5th Avenue, Seattle, Washington 98101 USA</td>\n",
       "      <td>Situated amid incredible shopping and iconic a...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>147</th>\n",
       "      <td>The Halcyon Suite Du Jour</td>\n",
       "      <td>1125 9th Ave W, Seattle, WA 98119</td>\n",
       "      <td>Located in Queen Anne district, The Halcyon Su...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>148</th>\n",
       "      <td>Vermont Inn</td>\n",
       "      <td>2721 4th Ave, Seattle, WA 98121</td>\n",
       "      <td>Just a block from the world famous Space Needl...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>149</th>\n",
       "      <td>Stay Alfred on Wall Street</td>\n",
       "      <td>2515 4th Ave, Seattle, WA 98121</td>\n",
       "      <td>Stay Alfred on Wall Street resides in the hear...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>150</th>\n",
       "      <td>Pike's Place Lux Suites by Barsala</td>\n",
       "      <td>2nd Ave and Stewart St, Seattle, WA 98101</td>\n",
       "      <td>The perfect marriage of heightened convenience...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>151</th>\n",
       "      <td>citizenM Seattle South Lake Union hotel</td>\n",
       "      <td>201 Westlake Ave N, Seattle, WA 98109</td>\n",
       "      <td>Yes, it's true. Every room at citizenM is the ...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>152 rows × 3 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                        name  \\\n",
       "0             Hilton Garden Seattle Downtown   \n",
       "1                     Sheraton Grand Seattle   \n",
       "2              Crowne Plaza Seattle Downtown   \n",
       "3              Kimpton Hotel Monaco Seattle    \n",
       "4                         The Westin Seattle   \n",
       "..                                       ...   \n",
       "147                The Halcyon Suite Du Jour   \n",
       "148                              Vermont Inn   \n",
       "149               Stay Alfred on Wall Street   \n",
       "150       Pike's Place Lux Suites by Barsala   \n",
       "151  citizenM Seattle South Lake Union hotel   \n",
       "\n",
       "                                             address  \\\n",
       "0    1821 Boren Avenue, Seattle Washington 98101 USA   \n",
       "1     1400 6th Avenue, Seattle, Washington 98101 USA   \n",
       "2                    1113 6th Ave, Seattle, WA 98101   \n",
       "3                     1101 4th Ave, Seattle, WA98101   \n",
       "4     1900 5th Avenue, Seattle, Washington 98101 USA   \n",
       "..                                               ...   \n",
       "147                1125 9th Ave W, Seattle, WA 98119   \n",
       "148                  2721 4th Ave, Seattle, WA 98121   \n",
       "149                  2515 4th Ave, Seattle, WA 98121   \n",
       "150        2nd Ave and Stewart St, Seattle, WA 98101   \n",
       "151            201 Westlake Ave N, Seattle, WA 98109   \n",
       "\n",
       "                                                  desc  \n",
       "0    Located on the southern tip of Lake Union, the...  \n",
       "1    Located in the city's vibrant core, the Sherat...  \n",
       "2    Located in the heart of downtown Seattle, the ...  \n",
       "3    What?s near our hotel downtown Seattle locatio...  \n",
       "4    Situated amid incredible shopping and iconic a...  \n",
       "..                                                 ...  \n",
       "147  Located in Queen Anne district, The Halcyon Su...  \n",
       "148  Just a block from the world famous Space Needl...  \n",
       "149  Stay Alfred on Wall Street resides in the hear...  \n",
       "150  The perfect marriage of heightened convenience...  \n",
       "151  Yes, it's true. Every room at citizenM is the ...  \n",
       "\n",
       "[152 rows x 3 columns]"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#读取数据  西雅图酒店数据集\n",
    "df = pd.read_csv('Seattle_Hotels.csv', encoding=\"latin-1\")\n",
    "df#有三列的数据  酒店名称 地址 描述  #有152个酒店"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 写一个函数对这个酒店进行描述"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\"Located on the southern tip of Lake Union, the Hilton Garden Inn Seattle Downtown hotel is perfectly located for business and leisure. \\r\\nThe neighborhood is home to numerous major international companies including Amazon, Google and the Bill & Melinda Gates Foundation. A wealth of eclectic restaurants and bars make this area of Seattle one of the most sought out by locals and visitors. Our proximity to Lake Union allows visitors to take in some of the Pacific Northwest's majestic scenery and enjoy outdoor activities like kayaking and sailing. over 2,000 sq. ft. of versatile space and a complimentary business center. State-of-the-art A/V technology and our helpful staff will guarantee your conference, cocktail reception or wedding is a success. Refresh in the sparkling saltwater pool, or energize with the latest equipment in the 24-hour fitness center. Tastefully decorated and flooded with natural light, our guest rooms and suites offer everything you need to relax and stay productive. Unwind in the bar, and enjoy American cuisine for breakfast, lunch and dinner in our restaurant. The 24-hour Pavilion Pantry? stocks a variety of snacks, drinks and sundries.\""
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#看看 第一个酒店的描述吧\n",
    "df['desc'][0] #英文看的有点困难  但是感觉是这个酒店在自卖自夸"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>desc</th>\n",
       "      <th>name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>What?s near our hotel downtown Seattle locatio...</td>\n",
       "      <td>Kimpton Hotel Monaco Seattle</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                desc  \\\n",
       "3  What?s near our hotel downtown Seattle locatio...   \n",
       "\n",
       "                            name  \n",
       "3  Kimpton Hotel Monaco Seattle   "
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[df.index == 3][['desc', 'name']]#这样得到指定索引的酒店的名称和描述"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['What?s near our hotel downtown Seattle location? The better \\r\\nquestion might be what?s not nearby. In addition to being one of the hotels near Pike Place Market, here?s just a small sampling of the rest. Columbia Center, whose Sky View Observatory on the 73rd floor is the tallest public viewing area west of the Mississippi Historic 5th Avenue Theatre, home to musical productions Seattle Central Library, an architectural marvel. Within half a mile: The must-see Pike Place Market, which houses the original Starbucks Pioneer Square, Seattle?s original downtown. Seattle Art Museum. Fantastic shopping, including the flagship Nordstrom, Nordstrom Rack, Macy?s, Columbia Sportswear, Louis Vuitton, Arcteryx, and oodles of independent boutiques. The Great Wheel. Washington State Convention Center. Within about a mile: The iconic Space Needle.  Bell Street Pier Cruise Terminal at Pier 66. Sports stadiums CenturyLink Field and Safeco Field, home to the Seattle Seahawks, Seattle Mariners, and Seattle Sounders.',\n",
       "       'Kimpton Hotel Monaco Seattle '], dtype=object)"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "example = df[df.index == 3][['desc', 'name']].values[0]#再通过values[0]来得到这个酒店的描述和姓名  此时example是df的第一行 也是唯一的一行\n",
    "example"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'numpy.ndarray'> (2,)\n"
     ]
    }
   ],
   "source": [
    "print(type(example),example.shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'What?s near our hotel downtown Seattle location? The better \\r\\nquestion might be what?s not nearby. In addition to being one of the hotels near Pike Place Market, here?s just a small sampling of the rest. Columbia Center, whose Sky View Observatory on the 73rd floor is the tallest public viewing area west of the Mississippi Historic 5th Avenue Theatre, home to musical productions Seattle Central Library, an architectural marvel. Within half a mile: The must-see Pike Place Market, which houses the original Starbucks Pioneer Square, Seattle?s original downtown. Seattle Art Museum. Fantastic shopping, including the flagship Nordstrom, Nordstrom Rack, Macy?s, Columbia Sportswear, Louis Vuitton, Arcteryx, and oodles of independent boutiques. The Great Wheel. Washington State Convention Center. Within about a mile: The iconic Space Needle.  Bell Street Pier Cruise Terminal at Pier 66. Sports stadiums CenturyLink Field and Safeco Field, home to the Seattle Seahawks, Seattle Mariners, and Seattle Sounders.'"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "example[0]#0是desc "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Kimpton Hotel Monaco Seattle '"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "example[1]#1是name"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "第10个酒店的描述：\n",
      "酒店名称: W Seattle\n",
      "酒店描述: Soak up the vibrant scene in the Living Room Bar and get in the mix with our live music and DJ series before heading to a memorable dinner at TRACE. Offering inspired seasonal fare in an award-winning atmosphere, it's a not-to-be-missed culinary experience in downtown Seattle. Work it all off the next morning at FIT®, our state-of-the-art fitness center before wandering out to explore many of the area's nearby attractions, including Pike Place Market, Pioneer Square and the Seattle Art Museum. As always, we've got you covered during your time at W Seattle with our signature Whatever/Whenever® service - your wish is truly our command.\n"
     ]
    }
   ],
   "source": [
    "def print_description(index):\n",
    "    '''写一个函数，来对这个酒店进行描述'''\n",
    "    print('第'+str(index)+'个酒店的描述：')\n",
    "    example = df[df.index == index][['desc', 'name']].values[0]#新建一个dataframe 然后得到这个df的第一行（也是唯一的一样）的所有内容 \n",
    "    if len(example) > 0:\n",
    "        print('酒店名称:',example[1])#1是名称\n",
    "        print('酒店描述:',example[0])#0是desc\n",
    "print_description(10)\n",
    "#函数编写完毕 这样就可以输入一个数字  然后就得到了这个酒店的名称和描述"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 写一个函数得到酒店描述中n-gram特征中的TopK个"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      Located on the southern tip of Lake Union, the...\n",
       "1      Located in the city's vibrant core, the Sherat...\n",
       "2      Located in the heart of downtown Seattle, the ...\n",
       "3      What?s near our hotel downtown Seattle locatio...\n",
       "4      Situated amid incredible shopping and iconic a...\n",
       "                             ...                        \n",
       "147    Located in Queen Anne district, The Halcyon Su...\n",
       "148    Just a block from the world famous Space Needl...\n",
       "149    Stay Alfred on Wall Street resides in the hear...\n",
       "150    The perfect marriage of heightened convenience...\n",
       "151    Yes, it's true. Every room at citizenM is the ...\n",
       "Name: desc, Length: 152, dtype: object"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['desc']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "pandas.core.series.Series"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(df['desc'])#类型是个Series"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "feature names:\n",
      " ['00 night plus', '000 crystals marble', '000 sq ft', '000 square feet', '000 square foot', '10 000 sq', '10 best hotels', '10 km nites', '10 km seattle', '10 minute drive']\n"
     ]
    }
   ],
   "source": [
    "vec=CountVectorizer(ngram_range=(3, 3), stop_words='english')#ngram=3的情况下提取文字特征\n",
    "bag_of_words=vec.fit_transform(df['desc'])\n",
    "print('feature names:\\n',vec.get_feature_names()[:10])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(152, 13098) <class 'scipy.sparse.csr.csr_matrix'>\n"
     ]
    }
   ],
   "source": [
    "print(bag_of_words.shape,type(bag_of_words))#此时还是个稀疏矩阵"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[0 0 1 ... 0 0 0]\n",
      " [0 0 0 ... 0 0 0]\n",
      " [0 0 0 ... 0 0 0]\n",
      " ...\n",
      " [0 0 0 ... 0 0 0]\n",
      " [0 0 0 ... 0 0 0]\n",
      " [0 0 0 ... 0 0 0]] (152, 13098)\n"
     ]
    }
   ],
   "source": [
    "print(bag_of_words.toarray(),bag_of_words.shape)#这个是转成数组之后的情况"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "matrix([[1, 1, 2, ..., 1, 1, 1]], dtype=int64)"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sum_words = bag_of_words.sum(axis=0)#系数矩阵中 有向量出现的次数 求和一下\n",
    "sum_words"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[('located southern tip', 1),\n",
       " ('southern tip lake', 1),\n",
       " ('tip lake union', 1),\n",
       " ('lake union hilton', 1),\n",
       " ('union hilton garden', 1),\n",
       " ('hilton garden inn', 1),\n",
       " ('garden inn seattle', 1),\n",
       " ('inn seattle downtown', 3),\n",
       " ('seattle downtown hotel', 4),\n",
       " ('downtown hotel perfectly', 1)]"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "words_freq = [(word, sum_words[0, idx]) for word, idx in vec.vocabulary_.items()]\n",
    "words_freq[0:10]#显示前10个"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[('pike place market', 85),\n",
       " ('seattle tacoma international', 21),\n",
       " ('tacoma international airport', 21),\n",
       " ('free wi fi', 19),\n",
       " ('washington state convention', 17),\n",
       " ('seattle art museum', 16),\n",
       " ('place market seattle', 16),\n",
       " ('state convention center', 15),\n",
       " ('high speed internet', 14),\n",
       " ('space needle pike', 12)]"
      ]
     },
     "execution_count": 47,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 按照词频从大到小排序\n",
    "words_freq =sorted(words_freq, key = lambda x: x[1], reverse=True)\n",
    "words_freq[0:10]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[('pike place market', 85), ('seattle tacoma international', 21), ('tacoma international airport', 21), ('free wi fi', 19), ('washington state convention', 17), ('seattle art museum', 16), ('place market seattle', 16), ('state convention center', 15), ('high speed internet', 14), ('space needle pike', 12), ('needle pike place', 11), ('south lake union', 11), ('downtown seattle hotel', 10), ('sea tac airport', 10), ('home away home', 9), ('heart downtown seattle', 8), ('link light rail', 8), ('free high speed', 8), ('just minutes away', 8), ('24 hour fitness', 7)]\n"
     ]
    }
   ],
   "source": [
    "# 得到酒店描述中n-gram特征中的TopK个\n",
    "def get_top_n_words(corpus, n=1, k=None):\n",
    "    # 统计ngram词频矩阵\n",
    "    vec = CountVectorizer(ngram_range=(n, n), stop_words='english').fit(corpus)\n",
    "    bag_of_words = vec.transform(corpus)\n",
    "#   print('feature names:')#print(vec.get_feature_names())#print('bag of words:')#print(bag_of_words.toarray())\n",
    "    sum_words = bag_of_words.sum(axis=0)\n",
    "    words_freq = [(word, sum_words[0, idx]) for word, idx in vec.vocabulary_.items()]\n",
    "    # 按照词频从大到小排序\n",
    "    words_freq =sorted(words_freq, key = lambda x: x[1], reverse=True)\n",
    "    return words_freq[:k]\n",
    "common_words = get_top_n_words(df['desc'], 3, 20)#3gram 排序处前20个\n",
    "print(common_words)# 这样算是函数编写完毕"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "                            desc  count\n",
      "0              pike place market     85\n",
      "1   seattle tacoma international     21\n",
      "2   tacoma international airport     21\n",
      "3                     free wi fi     19\n",
      "4    washington state convention     17\n",
      "5             seattle art museum     16\n",
      "6           place market seattle     16\n",
      "7        state convention center     15\n",
      "8            high speed internet     14\n",
      "9              space needle pike     12\n",
      "10             needle pike place     11\n",
      "11              south lake union     11\n",
      "12        downtown seattle hotel     10\n",
      "13               sea tac airport     10\n",
      "14                home away home      9\n",
      "15        heart downtown seattle      8\n",
      "16               link light rail      8\n",
      "17               free high speed      8\n",
      "18             just minutes away      8\n",
      "19               24 hour fitness      7\n"
     ]
    }
   ],
   "source": [
    "df1 = pd.DataFrame(common_words, columns = ['desc' , 'count'])\n",
    "print(df1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### N-Gram， 提取N个还续字的集合， 作为特征"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAgIAAAEICAYAAAAtNpw3AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/Il7ecAAAACXBIWXMAAAsTAAALEwEAmpwYAABYUElEQVR4nO2debxVZfX/3x9xQhFH8ocjzjijgjlhaGRl5pAWmZlkaZZDWqaWZWhWaqU5lKZ+DVNSc0YtZxHEgVlGsVKc00xBcRbX74+1jmw259x77uVOcNf79Tqvu/ezn2Ht5+x7nrWf4fPIzEiSJEmSpHOyVHsbkCRJkiRJ+5GOQJIkSZJ0YtIRSJIkSZJOTDoCSZIkSdKJSUcgSZIkSTox6QgkSZIkSScmHYGkUyOplySTtHR729KeSFpT0khJb0r6XZ1pZkka2ELlm6SNWyKvlixL0hBJV7e2TfUi6QBJz0maK2m79rYnaRta+zlMRyDpVLRk41XK95D4cZ4r6R1JHxXO5zYzz09IukbSi5LmSBot6ZOlOF+T9IyktyTdImm1Zt7CkcCrQHcz+2EVW4ZKOrOZebcaS6IjJ+cpSdOrXP4tcIyZdQNeX9LuvbWRNEDS8+1tR0cjHYEkaQHMbJiZdYsf6M8DL1bOI6w5dAPGAjsAqwFXAndI6gYgaUvgT8ChwJrA28Afm1nW+sB0S4WxjsDuwCeADSX1K11bH5jWEoXU40Ckk9H2tEudm1l+8tMpPsBVwEfAO8Bc4CSgF2DAYcCz+FvxqYU0SwGnAP8G/gf8DVitkXIGAM8XzjcHRgCz8R/xfQvXhgKXAPcAbwIPAus3kPcbwA5x/Cvgr4VrGwHvAyvVSLsL7ljMib+7FGz4INLOBQaW0h1Zun5bhM8CTgQmR57XAcsX0u0DTIr7fhjYpoH7MuAo4J8R/w+ACt/BT4FngFeAvwArx7VnI+3c+Owc4YcDM4DXgbuKdRrxN65hxwbxHbwZ38lFwNWF6zvFvcwGHgcGFK4NBp6KtE8DhxSuHRH2vAlMB7ZvoC6uAIYBNwEXRdhycX8GvIU/j82996Ojnp+uUnaviPOtyH9knXl+L/J8E/gF/iw+jD+vfwOWLdXFv4DXgOHAWhF+MfDbkj23Aj+I47WAG4H/Rv0e10Ad7h31/CbwAv6croj/739UqLO1om5/D7wYn98DyxX/l4Gf4L8Nsyrfazwrs4Gl4vwy4JXS783xBduHxz3/CziiEG8IcANwddTXt2nkOWzx38bWyjg/+emIn/hHHlg4r/zwXQZ0BbYF3gM2j+vfBx4F1okfjD8B1zRSxgDCEQCWiX/8nwDLAnvGP/dmcX1onO8e+Z8PPFQj3z7Au8xvBG8FTi7FmUs4CqXw1fAf8UOBpYGD43z1gh1nNnBPC12PuhwTP3Kr4Q3FUXFtO7zR/iTQBXe0ZlV+YKvkb8DtwCrAeviP/efi2uFRhxvivSQ3AVeVvr+lC3ntF/E3j3v9KfBwqaxajsAjwLnxXewe383VcW1t3BncG3dOPhPnPfBG5o3C99oT2DKOv4w3Rv0AARtTw9kDVoh89gYOxBufZavZvgj3fk98X12rlF/J8y9xT13rzPNWoDuwJf7/c198XyvjDfJhEXfPuKfto44vZL6zsTvwHPMdwFXxhnutqO/xwGn4/9GGuNP12Rr1+BLQv5DP9uX/zULcM/D/8U/Ed/kw8ItC/A8Lz8SncEes8j0/y3zHfGbYtHnh2nZxPBLvrVse/z/+L7BnXBuCO9r7x312pYHnsFV+F1sr4/zkpyN+qO0IrFMIGwN8NY5nAJ8uXOsZ/7RLN1DGxz82QH/gP8RbQ4RdAwyJ46HAtYVr3YB5wLqlPLsDU4AfF8LuIxreQtgLFN5SC+GHAmNKYY8Agwt2NMcR+Hrh/Bzgkji+uPJjWrg+E/hUjfwN2K1w/jfglMJ9fq9wbbPKd0D1xvAfwLcK50vhwybrF8payBHAHZAPgRULYX9lviNwMuGAFK7fhTs5K+JvhwdSamAjzvfrfD6/jjcSS+ONxhzggFI9NeQI1HPvezZQfiXPDZuY566F6+MpOKjA74Dfx/H/AeeUnvcPolzhjefuce0I4P44/iTwbMnWHwN/rnEfzwLfwee8VP3fLIT9G9i7cP5ZYFYhfvmZ+Bvwszi+CvgB8P/w5/scvGfr494CYF38f3qlQh6/BobG8RDCGarnOWyNT84RSBLnP4Xjt/EfKPAx2ZslzZY0G3cM5uFj8vWwFvCcmX1UCHsGf7us8FzlwMzm4t2Ha1XCJHUFbgMeNbNfF9LNxR2EIt3xt4dqdjxTCivb0RwaqrcfVuot6m5dCvfVhLzKtj+DN5S1voP1gfML5b6GNzKN3etawOtm9laprGK+Xy7d025Az0gzCG8EXpJ0h6TekW5dvLGph8OAv5nZh2b2Lt4VflidaSs2Nnbvz1VLWKIYp548Xy4cv1PlvOp3Gc/7/4C1zVu8a/HeKoCv4UMkFRvWKtX9T6j9DByI96o8I+lBSTs3cK/Vnq/ic1rtmahcfxB3FnbH3/pH4L0GnwJGxf/9WsBrZvZmKY9a30ljz2GLk45A0tmwJsZ/Dvi8ma1S+CxvZi/Umf5FYF1Jxf+19fA39wrrVg5iIuBqkQ5JywG34OOU3ynlPQ0fyqik3RDvSnyyhh3rl8LKdjREc+rtl6V6W8HMrmliPrCw7ZU3ppdr2PUc8J1S2V3N7OFGynkJWFXSiqWyivleVcp3RTM7C8DM7jKzz+C9Rk/gw02VdBs1dpOS1sG7zr8u6T+S/gMcBOwtaY0qSZp77/V8l8U4za3PaizwXUZdr8785/Aa4CBJ6+O9ADcWbHi6ZMNKZrZ3VePNxprZfnh3/y34W3z5vqrahH/nLxbOqz0TlesP4r1+A+L4IWBX3BF4sJD/apJWKuVR/N8r2tXYc9jipCOQdDZexscX6+US4Jfxw4SkHpL2a0L6x/C325MkLSNpAPBF/M2nwt6SdpO0LD7R6lEze07SMvgkonfwMdaPFsyaYcAXJfWPH40zgJtKbx4V/g5sGssNl5Y0CNgCH5evh6bW22XAUZI+GcvhVpT0hdKPYb1cA5wgaYNwlH4FXGdmH+Ld6B+VbLsE+HGsqkDSypK+3FghZvYMMA44XdKyknbDv6sKV+P1/VlJXSQtH8vR1pHrMOwX38N7eG9N5fu6HDhR0g5RFxtXnqcSh+JO3Gb4OHIfYFPcCTy4SvwWu/dGaMk8rwG+KalPOLm/Ah4zs1kAZjYRn0NwOXCXmc2OdGOANyWdLKlr1P9WVVZVEN/dIZJWNrMP8DkXle/iZWB1SSuXbPpp/G+vgc9DKK/ZrzwT/fFJsNeHvf/E/z+/DjxoZm9EGQcSjoCZPYfPO/h1PDPb4JMxq+oC1PEctjytNeaQn/x0xA8+8elZfPzuRKqPs44Avh3HS+FjgDPxLvd/A79qpIwBLLhqYEv8R2EOPnGqOOY7lPmrBubi3YsbxLVPhW1vM3+W81xiElTE+Vrcz1v4hK2aKxrwbuzxYcd4FhyTH0rDcwQ2Yf4KgFsibBYLzrcYwoIz7D+Hr06Yjb/lXE/tFQ0LjNsX7Ynv4DT8rfC/+A/oqoW4Z0T4bGCnCDsUn1PxRqS7olZZJTs2BEZFPVdbNfDJ+C5fizLvwN/Weha+49nxDG1RSHdUPENzganEJLJS2U8Ax1YJPwkYV6OeWuze43ovSv8PTc0TfyseXDg/E7i8VBf/jjq8ncL8nLj+s8jzy6XwtfBG+z/4RNdHKa1wiXjLAndGnDfwZ7D4rF+BD0fMjjyXBy7An9GX4nj54v8ycCruoDwLHFoq7xoKKzBwrYc3gS6FsHXiXl+Lez+qcG0IpfF/GnkOW/pTmZ2ZJEk7IGko7jT8tL1tSZJkQaIH72ozW6edTWlVcmggSZIkSTox6QgkSZIkSScmhwaSJEmSpBOTPQJJkiRJ0onJDSWSxY411ljDevXq1d5mJEmSLFaMHz/+VTPrUQ5PRyBZ7OjVqxfjxo1rbzOSJEkWKyRVVSjMoYEkSZIk6cRkj0ALI+ly4Fwzmy5prjV/L/pa+bd4nq2FpCHAXDP7bZ3xf2Jmv2os3pQX5tDrlDsW1bwkSZLFillnfaFV8s0egRbGzL5tZtPb2472RlJznMyftLghSZIkSYOkI9AMJPWS9ISkYZJmSLpB0gpxbYSkvqX4a0h6JLTWe0i6UdLY+OxaJf/Bkm6NvP4p6edV4nSTdJ+kCZKmFPXvJX1D0mRJj0u6KsLqLfcWSfdImiXpGEk/kDRR0qOSVot4R0Qej0eelXsfKukSSY/h23EW8z5C0j9CJ/zrksZImiTpT6EbfhbQNcKGlW1LkiRJWod0BJrPZsAfzWxzXM/6e9UiSVoT1yM/zczuAM4HzjOzfvjGFJfXyH/HuL4NvvVp39L1d3HN+u2BPYDfxYYmWwI/xfcc3xb4fsSvt9ytgC8B/YBfAm+b2Xb43vXfiDg3mVm/yH8GvoFGhXWAXczsB4U6OAbfqGN/XMt8EL5/eR98S99DzOwU4B0z62Nmh5SNknSkpHGSxs17e04N05MkSZKmknMEms9zZjY6jq8GjsM3myiyDHAfcLSZVbakHAhsIakSp7ukbub7che5x8z+ByDpJnzDmOJUeQG/krQ7vrPW2vje3HsC15vZqwBm9loTy33AfPe6NyXNAW6L8Cm4UwKwlaQzgVXwfcbvKqS/3szmFc6/gW9Ssr+ZfSDp08AOwNiwpSvwCo1gZpcClwIs13OTVMFKkiRpIdIRaD7lxqha4/QhvsvbZ5m/N/VS+C5h7y5i/ocAPYAdooGdhe+iVYt6y32vcPxR4fwj5j8vQ/GG/XFJg/Eduiq8VcpvCr6d6jrA07gDc6WZ/bgRO5IkSZI2IB2B5rOepJ3N7BF8K9iHqsQx4HDgekknm9nZwN3AscBvACT1MbNJVdJ+Jsbk38G71A8vXV8ZeCWcgD2Ayv7m9wM3SzrXzP4nabXoFai33HpYCXhJ0jK4Q/JCA3EnAhcDwyV9Fu8huVXSeWb2StzjSuZ7cH8gaRnzPcRrsvXaKzOulWbPJkmSdDZyjkDzmQkcLWkGsCre2C1EdJMfDOwp6Xv4EELfmMw3Hd+buxpjgBuBycCNZlZW0BkW+UzBu9+fiPKm4WP7D0p6HDg34tdbbj38DHgMGF0ptyHM7CHgRHyuxCv4HIa7JU3G99ruGVEvBSbnZMEkSZK2IzcdagaSegG3m9lWrZT/YKCvmR3TGvkv7vTt29dSWTBJkqRpSBpvZuWJ59kjkCRJkiSdmZwj0AzMbBa+zK618h+KT8hLkiRJklalTRwBSasAXzOzP7ZFec1B0hnASDO7t4E4A4D3zezhNrBnAbldSQ+b2S4tXMZQfIjjhjrjH4XrCvylJe2IvHvh+gN/bSxuSgw3n9aSKE2SZPGlrYYGVqGG4E5HwcxOa8gJCAYATWqMmym1CyW53ZZ2ApqDmV1SzQlYhHsspu+Fr75IkiRJ2pC2cgTOAjYK+djfNEMet5ek+yP8PknrRfhQSReH/O1TkgZIukIu+zu0kOfFoUo3TdLp1QyMvA6K41mSTi/Y1zveWI8CToj76K8asr2Shki6StJo4Ko4v0IuGfyUpOMK5d4iaXzYdmSELSS3K2lu/FXU4dSwbVCED4j8b9B8+WPFtdPCvqmSLq2E10K1JYSHSDoxjkdI+r2kccD3NV9eeJykJyXtE/GWl/TnsHWifKljRc54uKT78SWFZwH9455PaOR5SpIkSVqItpojcAqwVUjKVt4ADzCzNyStATwqaTiwBb60bBcze1WhbQ9ciIvQXCnpcOACfG09+NK9nYF9geHArsC3ceW6ylr5U83sNUldgPskbWNmkxux+VUz216+5O9EM/u2pEso7KYn6a+4bO9D4ZzcBWwe6bcAdjOzd+S78PXGpYBXAmZKujjWyx8etnUNm280s1MkHVOprxJfwgV6tgXWiDQj49p2wJbAi/jSvl1xfYOLzOyMsPkqXO73Nmpzk5ldFvHPxCWEL6wSb9nKDNRwvHrh0sgbAQ9I2hg4GjAz21pSb3zZ4KaRfntgm7j/AVHP+1QzKJykIwG6dO/RgOlJkiRJU2ivVQMVedzJwL00Lo+7M1AZO74Kl9utcJv5GsgpwMtmNsXMPgKm4Q0TwFckTcDFbbbEG+nGuCn+ji/kU2YgcJGkSbgT0l1SZYvg4Wb2TiHuHWb2XtzbK3G/AMfJ1/s/CqwLbNKIXbsB15jZPDN7GVcs7BfXxpjZ83H/kwp27yHpMbnmwJ54HTTEVpJGRfxDGoh/Xen8b2b2kZn9E3gKd352wyWYMbMngGeAiiNwT+E7bhAzu9TM+ppZ3y4rrFxPkiRJkqQO2mvVQFPlcRuiKIFblsddWtIGuJhNPzN7Pd5c6ymrktc8atdTVdne6HkvS+0WbZsXtg3AnYmdzextSSPqtK0xm4tlLA/8EdcleC56JxorYyi1JYSLlO+xHtnlhtLXRSoLJkmStBxt1SPwJt4lXqEhedwvS1odoDA08DDw1Tg+BBjVhLK74w3OHPlOgJ9v3i0AC99HRbYXcNneJua3MvB6OAG9gZ0K1z6QS/iWGQUMkm/d2wPYHVchrEWl0X81eisOqsOusoRwvXxZ0lKSNgI2xNUXR1XyiCGB9SK8TLlukyRJkjagTRyB2EVvdExW+w1Nl8c9FvhmDCUcyvytdesp+3F8SOAJfHhhdMMpGuQ24ICY0NafRZftvRN/a5+BT5Z7tHCtltzuzbjs8OO443SSmf2nVgFmNhu4DJiKz2EYW4ddTZIQLvAs7pT8Azgqekr+CCwV3/V1wGAze69K2snAvJigmJMFkyRJ2oiUGE5aBDVRk2BRSInhJEmSpqOUGE6SJEmSpMxi4whI2l/SFoXzwZLWKpyPkLSQp9NAfguk72hI6ivpgkbirBLLG9vCnnL9nyFpYOXczAYvam+AXAvh9kXJI0mSJGkai9NeA/sDtwPT43wwPu79YjPzW9T0rUpsO9xY//cquGJj3dLN8iUNiiWGTWF/CvVvZqc1MX2LkRLDzSclhpMkKdPqPQKSVpR0R0wCm6r5Sng7SHpQrqp3l6SeEb6Qqp2kXXDBoN/ERL2Tgb7AsDjvWipzL0mPyJUBry+s7a9cP6icXjXU9yRtLOnesGeCpI3k1FL3e1DSrXIFwbMkHSJpTMTbKOJ9Ub6uf2LkvSYlim/Hqq1MuIBiY8T9UdzHZIWKolyZcaakv+DOT3+5+uJlckXDuyt1WGf9b6QFlRg/HfcyJexcLsIXUmiM8B3j+5ko6WFJmzX3+UqSJEkWjbYYGvgc8KKZbWtmWwF3ypelXQgcZGY7AFfgqwXAVe36mdm2wAzgW7HJz3DgR2bWx8zOxt+WD4nzj4V75EqFPwUGmtn2Ee8HRYOiC7uc/qIodyugK66+B77C4Q9hzy7ASyyo7jcQbyB7Rvxt8dUDm+MrHDY1sx2By5m/1PAhXH9gO+Ba4KQ66rE38Flcue/nUYenAP+Oe/iRpL1wQaIdw74dJO0e6TcB/mhmW+KiPpvEfW0JzAYObEL9/7tQ38vjugODzGxrvJfpuwW7X43v4WJczwF8JUL/uP/TgF+RJEmStAttMTQwBfidpLPxWeWjJG2Fb+N7T7x4d8EbWHBVuzPxbu9u+JK3prATrhw4OvJeFnikjnR7SDoJWAFYDZgmF/hZ28xuBqgIB0n6WN0PeFlSRd3vDWCsmb0U8f6Naw1U6mGPOF4HuC6ch2WBp+uw745YdveepKIyYZG94jMxzrvhDf6zwDNmVlye+HTIL8OC6olNrf/NIq8n4/xKXFb493FeVGj8UhyvDFwpaRNcdKiaXsICKCWGkyRJWoVWdwTM7ElJ2wN7A2dKug9fCz/NzHaukmQo9ana1UK4dO3BdSdonvpeLcrqhkXlw0p9Xwica2bD5eqCQ5qYby21QwG/NrM/LRDoGyY1pnRYGV4ZyqLVf5lqCo2/AB4wswPCthGNZWJml+LaCizXc5Nc85okSdJCtLojIJ+Z/5qZXS1pNr4h0FlAD0k7m9kj0c29aQgKlVXtXoisyspztZToHgX+IGljM/uXpBXxt/onS/GK6aup791gZm9Kel7S/mZ2S4x9d8HV8r4j6Uq892B34Ed49309rFy4r8PqTFONch3cBfxC0jAzmytpbeCDJuZZb/1XmAn0qtQ3PhzyYCNlFO9/cBPtS4nhJEmSFqQt5ghsDYyRb8zzc+BMM3sfb2zPlisITsLH36G2qt21wI9igtlG+JvrJSpNFjSz/+KNyzVyJcJHqN5Af5wef2utpb53KL4x0GRc6vj/0UR1vyoMAa6XNB54tQnpFqCs2Ghmd+PqiY/IlfxuoOmyvfXWf8WGd4Fvxv1MwXs+LmmkjHOAX0uayOK1ciVJkmSJI5UFk8WOVBZMkiRpOkplwSRJkiRJyqQjkCRJkiSdmHQEOhGSjgshofKOhm1R9sON2SRpX0mntLVtSZIknZmcI9CJkPQELrT0fCl8aTP7sCPZ1BDL9dzEeh72+9YzajEmJYSTJKlFzhHo5Ei6BNgQ+IekE+SyxVdJGg1cJalHSAqPjc+ukW7FkA0eEysG9quS9x8k7RvHN0u6Io4Pl/TLOJ5bh02DJV3UapWQJEmSLEQ6Ap0EMzsK32BpDzM7L4K3wN/GDwbOB84zs3643PDlEedU4P6QSd4Dl1NesZT9KKB/HK8d+RJhI5toU1UkHSlpnKRx896e0/gNJ0mSJHWRa7g7N8ML+zQMBLYIWWaA7iGutBewr6TKPgHLA+vh+xBUGAUcL9+meDqwasgn7wwcRwuQyoJJkiStQzoCnZui7PBS+EZI7xYjyD2DA81sZq1MzOwFSavgG0yNxNUWvwLMNbM3W9zqJEmSpMVIRyCpcDe+O2JlO+M+sSnRXcCxko41M5O0nZlNrJL+UeB4YE9gdVzV8IbWMDQlhpMkSVqOnCOQVDgO6CtpsqTp+FbK4BsELQNMljQtzqsxClg69huYgPcKjGplm5MkSZJFJJcPJosdKTGcJEnSdHL5YJIkSZIkC5GOQJIkSZJ0YhYLR0DSUEkHVQlfS1KzJ6RJ2j+WvLUakn7SkvE6MuX6lHSGpIHtaVOSJEnSMIvFHAFJQ4HbzaxFZ6G3Vr6lMuaaWbeWiteRaYv6hJQYThnhJEmaQ5vOEZD0I0nHxfF5ku6P4z0rG95IujiU4qZJOr2Q9ixJ02P2+m8L2e4u6WFJT1V6ByT1kjQ1jgdLuknSnZL+KemcQp7fkvRkyOReJukiSbsA++JKeZMkbSSpj6RHo+ybJa0a6UdIOjvSPympPyUk9ZQ0MvKaKqm/pLOArhFWue9bJI2P+z6ycs9V4n09ypsk6U+SulQps1/UyeMRdyVJy0v6s6QpIQm8R0P1I+koSb8p5PmxzG8tGyTNlfTLKPdRSWvWqM+Pe3IkfTrsmSKXLF4uwmdJOl3ShLjWu87HLEmSJGkBWmtooCg52xfoJmkZFpScPTU8k22AT0naRtLqwAHAlma2DXBmIc+ewG7APsBZNcrtAwwCtgYGSVpX0lrAz4CdgF2B3gBm9jAwHPiRmfUxs38DfwFOjrKnAD8v5L10yOweXwqv8DXgLjPrA2wLTDKzU4B3Iv9DIt7hZrZD1MtxklYvx5O0edzHrpHfPOCQYmGSlgWuA75vZtviyoDvAEf77dnWwMHAlZKWr1U/wI1R5xUGAdc2YsOKwKNR7kjgiBr1WbF1eWAoMCjsWhr4bqHMV81se+Bi4ESqoJQYTpIkaRVayxEYD+wgqTvwHvAI3vD1Z/7a8q9ImgBMBLbE9ennAO8C/yfpS8DbhTxvMbOPzGw6sGaNcu8zszmhjjcdWB/YEXjQzF4zsw+A66sllLQysIqZPRhBVwK7F6LcVLi3XlWyGAt8U9IQYOsGFPWOk/Q4LsCzLrBJlTifBnYAxkqaFOcbluJsBrxkZmMBzOyN2EFwN+DqCHsCeAbYNNIsVD9m9l/gKUk7hSPWGxjdiA3vA7c3Uh9lW582syfjvKl1i5ldamZ9zaxvlxVWbqS4JEmSpF5aRVnQzD6Q9DQwGHgYmIxvWLMxMEPSBvibXz8ze10+try8mX0oaUe80TkIOAZXqgN3KCqI6hTjzKNl76+Sd9V8zWykpN2BLwBDJZ1rZn8pxpE0AH9z39nM3pY0AtfuLyPgSjP7ccuZD9Sun2txSeAngJtDQbAhGz6w+ZNLWqKeG6zbJEmSpPVozR/dUXhjfzjezX4uMD4ame64zv0cSWsCnwdGyDe5WcHM/i7fHvepFrBjLPD7GO9/E99Zb0pcexNYCcDM5kh6XVJ/MxsFHAo8WC3DakhaH3jezC6L8e/t8aGGDyQtE70RKwOvhxPQGx+uqFCMdx9wq6TzzOwVSasBK5nZM4X4M4GekvqZ2VhJK+FDA6PwLvz7JW2KbxA0M+ypxc34LoPbASdHWD02lPm4PkvMBHpJ2jiUB5tUt2VSYjhJkqTlaG1H4FTgETN7S9K7EYaZPS5pIv4G+hzeFQ3eiNwaY8oCfrCoRsSGOL8CxgCvRZmVQeZrgcvkExsPAg4DLpG0Au6EfLMJRQ0AfiTpA2Au8I0IvxSX552AO0VHSZqBN46PFtJ/HC/mCfwUuFvSUsAH+Nj/x42wmb0vaRBwoaSuuBMwEPgjcLGkKcCHwGAze0+q1YkC0SszA9jCzMZE2PTGbKhCuT4r+b8r6ZvA9ZKWxp2zSxrIJ0mSJGkjFovlg4uKpG5mNjcaoZuBK8zs5va2K2keKTGcJEnSdNTJJYaHxIS3qcDTwC3tak2SJEmSdBA6xcQsM6u6JC1JkiRJOjudwhFoDyTtDzwZyx2RNBi428xejPMRwIlm1qJ93LEy4f1Y179EMuWFOfQ65Y72NqPNSCXBJElak84yNNAe7I9rI1QYDKzVmgXGHIgBwC6tWU6SJEmy5JCOQAlJK0q6I+Rzp8bMfCTtIOlBuTzwXZJ6RvgRksZG/BslrVBFbvdkXFBpWJx3LZW5l6RHQmb3+lhGWbZroXIifKikSyQ9BvwNOAo4IcrpX8pjiKQrJY2S9IykL0k6Ry7te6dc/bEi+7tGHPeN3gskfSrynSSXC14pwn8Utk1WyEWrIP8c5yfKxZYqks3nyZUCZ8ilkm+SSx8X1SSTJEmSViYdgYX5HPCimW1rZlsBlQbyQuCgkAe+AvhlxL/JzPqF3O4M4FtV5HbPBsYBh8T5O5XCosH9KTAwZHbHUX3Z5ELlFK6tA+xiZl/Cl+WdF+WMqpLPRrhI0764AuEDIfv7Di6G1BAnAkeH5HB/4B1Je+HqiDviEsY7yIWVGuP9mL16CXArvjRxK2CwXOFwAZQSw0mSJK1CzhFYmCnA7ySdje+kN0rSVngjdU+sx+8CvBTxt4q32FWAbsBdTSxvJ3wIYXTkvSwuyVymoXKuN7N5dZb3j1B+nBL3cWeET6FxqeDRwLnyjZFuMrPnwxHYC5eKJmzbBHi2kbyGF8qdZmYvAUh6Cpde/l8xspldimstsFzPTZb8Na9JkiRtRDoCJczsSUnbA3sDZ0q6D9cemGZmO1dJMhTYP0SSBuNj9E1BwD1mdnAj8Roq560mlPcegJl9JKkoFfwR85+HD5nfW/SxBLKZnSXpDrxuRkv6bNj/azP70wI3Ja3Dgj1OZSnliqzwRywofVy0I0mSJGll8ge3hHy3wtfM7GpJs4Fv47sd9pC0s5k9EkMFm5rZNFwN8aUIOwR4IbIqy+3Wkt99FPiDQn5X0orA2oUNeirUKqfMm0D3Jt52mVn4hkP/wCWZAZC0kZlNAaZI6odvUHQX8AtJw0K0aW1chfBl4BPRzT8X3zXyTlqAlBhOkiRpOXKOwMJsDYwJAaKfA2ea2fu4ZO7Z8p0DJzF/Zv7PgMfwbvMnCvlci0sOT5S0Ef5Gf0l5smDs/jcYuEbSZHxYoHcVu2qVU+Y24IBqkwWbwOnA+ZLG4RsBVTg+JlBOxhv7f5jZ3cBfgUdiuOEGfE+CD4AzcGnnexqxOUmSJGknOoXEcLJkkRLDSZIkTUedXGI4SZIkSZIqpCOQJEmSJJ2YnCxYA7WSBHBrIJcVPtHM9qkz/mAKcsetYE8vXNfgr3HeB1jLzP5eKL+vmR3TnPxTYjhJkqTlyB6BxRy5rHBTGUzryh33Ar5WOO+DLzlMkiRJOhid2hEIGdwnJA0LqdsbKtK9pXgXh6rdtIqEboT3k/RwyP6OkbSSpC6SflOQ3P1OA+UOlfRklD9Q0uiQ2d0x4u0olx6eGOVsFuGDJQ2XdD9wXynvfpWVCqoiiyzpIBqWOz5O0vSw/doIW1HSFXGPEyXtV7iPUXJp5AlyaWXw5Zb9NV9e+QxgUJwPKpXXQy6ZPDY+uzbxa0ySJEkWgRwagM1wWeDRkq4Avgf8thTnVDN7TVIX4D5J2+DL4a4DBpnZWEndcZnebwFzzKyfpOVw4Z27zezpUp4bA18GDgfG4m/Qu+HSvz/BNy16AuhvZh9KGgj8ivnr+rcHtgm7BgBEQ3whsB+ufHgVsJ+Z/Tca4F+a2eGSjqH2sMcpwAZm9p6kVSr3D9wfaVfBl1feC7wCfMbM3pW0CXAN7mScQmGoQtLLFIYCYmigwvm4JPJDktbDdQk2Lxsl6UjgSIAu3XtUMTtJkiRpDukIwHNmNjqOrwaOY2FH4CvREC0N9MQlgQ14yczGApjZG+AbCAHbxJs3wMq45G7ZEXg6xHmQNA24z8ws1uL3KqS9MhpZA5YppL/HzF4rnG+OS/DuZWYvqmFZ5IaYjPcW3ALcEmF7AftKOjHOlwfWA14ELoo5APOATevIv8xAYIuwEaC7pG5mNrcYKSWGkyRJWod0BLyBrXkuaQN8s51+Zva6pKEsLJe7QBLgWDNrbM+BsqxuUXK38r38At8U6ICYgDeikKYsK/xS2LUd3kCL2rLIDfEFYHfgi8CpkraOvA40s5nFiPLdBF8GtsWHmd5tYllEup3MrDlpkyRJkkUkHQFYTyEdjHfPP1S63h1vdOdIWhP4PN4gzwR6SuoXQwMr4UMDdwHflXR/bO6zKfCCmTVlP4AKKzNfSnhwI3Fn48MS90h6C3iY2rLIVeWOJS0FrGtmD0h6CPgq8zc4OlbSsdFrsZ2ZTQz7no99Cw7Dex2okn8teWWAu4Fjgd+EDX3MbFJDN5oSw0mSJC1Hp54sGMwEjpY0A1gVuLh40cwex3fWewKX0h0d4e8Dg4AL5bLD9+Bv5JcD04EJkqYCf6L5Dtc5wK8lTawnDzN7Gdf0/wPeM1BLFnkoVeSO8Yb86hiemAhcYGaz8Z6JZYDJMYzxi4j/R+CwyL8383spJgPzYhLlCcADePf/QpMF8aGYvjE5cTpwVB31kiRJkrQQnVpiOLrbbzezrdrblqR+UmI4SZKk6SglhpMkSZIkKdOp5wiY2Sx8Zn2SJEmSdEqWSEdA0vHApWb2dkvE68iEhsD7ZvZwnB8FvG1mf2lPuyqoJDfcEizpEsMpKZwkSVuypA4NHA8spBC4CPE6MgOYPwkQM7ukozgBQS8WlBtuFDVPNjlJkiRpBou1IxDSt3fE7PSpkgZJOg7X0X9A0gMRbyGJ4Brx9pJL+k6QdL2kblXK3FjSvVHmBLmUr+SywlMlTanMjJc0QNIIuXRxRcpYkj4n6fpCngMk3d6QDZJmSTo9wqdI6h1v20cBJ8SM/P6ShlSEfyT1kfRozMi/WdKqET5C0tlyyeAnJfWvUb8nR1mPSzorwjaSdKdctniUpN4RPlTSBXIp5Kc0X1CpKDd8gmpIMEcdjJI0HF91kSRJkrQBi7UjAHwOeNHMto2Z/3ea2QW4oM4eZrZHxDs1ZkpuA3xK0jbleJLWAH4KDDSz7YFxwA+qlDkM+IOZbYu/ib8EfAnfWGdbXCnvN5J6Rvzt8J6HLYANgV2Be4FPSlox4gwCrq3Dhlcj/GJcwncWcAku0dvHzEaVbP0LcLKZbQNMAX5euLa0me0Ytv28lA5Jn8elij8Z93pOXLoUF0zaARda+mMhWU9cJnkf3AEAlxseFfadR0GCGegHHCEXbQKXTf6+mS2kUCjpyHDmxs17e075cpIkSdJMFvcu2CnA7ySdjS8DLDeEFapJBE8uxdkpwkfL5W6XBR4pRpCLBq1tZjcDVNTwJO0GXGNm84CXJT2IN3JvAGPM7PmINwnoFbr6dwJflHQDruZ3EvCpRmy4Kf6Ox52PmkhaGVjFzB6MoCuB6wtRinn1qpLFQODPlfkTsadBN9z5uV7zJYGXK6S5xcw+AqbLxZeqUUuC+X28rspSzET5KTGcJEnSCizWjoCZPSlpe3yL2zMl3WdmZxTjqH6JYOH6/Qe3sJlFKeF5zK/za4FjgNeAcWb2prx1bciGSl7FfBbVrqbktRQw28z6NJIneH1Wo6oEs3zSY3PUF5MkSZJFYLF2BCStBbxmZldLmg18Oy5VJG1fpbZEcDneo8AfJG1sZv+Kbvu1zezJSnnRWD8vaX8zu0W+u2AXYBTwHUlXAqvhWv0/wtX2avEgcAVwBO4UUI8NVXgz7nEBzGyOpNcl9Y+ekkOjzHq5BzhN0jAze1vSatEr8LSkL5vZ9eG4bBPqiw3ZV5QXrirB3AS7UmI4SZKkBVnc5whsjW+JOwkf5z4zwi8F7pT0QC2J4Crx/ovr+V8jaTLeJV+tIT8UOC7iPAz8P+BmfKjhceB+4CQz+09Dhscwwu24Y3J7hNVrQ5HbgAMqkwVL1w7D5ytMxucwnFFO3IB9dwLDgXFRv5WdBw8BviWXFZ6GzyNoiLLccEtKMCdJkiSLSKeWGE4WT1JiOEmSpOkoJYaTJEmSJCmTjkCSJEmSdGJybLaEGtiRUNIZwEgzu7eB9EOAuWb221YzsonESonbzeyGUng99zOAgoRxayLpJ2b2q8biLWkSwykpnCRJe5KOQBMws9Pa24aWpM77GQDMxSdG1oWkpc3sw2aY9BOgUUcgSZIkaTlyaKA6XSRdJpckvltSV/hYRvegON5bLhs8PqR1by+k3yJkfJ+SSxkvQMjsDtV8SeITInyEpPNjBcBUSTtG+IqSrpBLAk+UtF8hn2pyvZJ0kaSZku4FPlHtJkv3U6+EcQ9JN0aZYyXtGumHSLpK0mjgqji/olo9SPp63MskSX+K+zgL6Bphwxbp20uSJEnqJh2B6myCywhvCcwGDixelLQ8vuzt8yG126OUvjfwWWBH4OeSlild74PrA2xlZlsDfy5cWyEEe76H6wwAnArcH5LAe+BLAlektlzvAcBmuErhNyhsStQI9UgYnx/n/aJeLi+k3wKXR64IIi1UD5I2xyWVd437nAccYmanAO9EOYeUDVNKDCdJkrQKOTRQnafNbFIcV5Pg7Q08VZDDvQY4snD9DjN7D3hP0ivAmsDzhetPARtKuhC4A7i7cO0aADMbKam7pFVwWd59FZsJ4cqI61Fbrnd35ksevyjp/jrvux4J44F4j0flvLvmb8403MzeKcStVg+fBnYAxkYeXYFXGjMsJYaTJElah3QEqlOWBe66iOkXqOeQOt4Wf1s+CvgKcHjlcikvw2V5DzSzmcULoexXTa537ybaW7a7IdnhpYCdKvssFMqEhSWCq9WDgCvN7MfNtDGVBZMkSVqQHBpoHjPxN/pecT6oKYnluwwuZWY34rsNbl+4XNnCeDe8238OLst7bDT8SNou4lbkepeJ8E1jyGAkMCjG3nviwwnNpSwRfDdwbOFe+jQxv/uAgyR9ItKvJmn9uPZBlWGUJEmSpBXJHoFmYGbvSPoeLk/8FjC2iVmsDfxZUsURK74dvytpIrAM83sJfgH8HpgcaZ7Gt/q9HB+2mBBOwn+B/XHJ4z1xKd9nKe2i2ERuA26ICYrHAsfh+yFMxp+fkXivRl2Y2XRJPwXujnv5ADgaeAbv+p8saUK1eQJJkiRJy5MSw81EUjczmxsN8B+Af5rZeYuY5wh8kl7q5zZASgwnSZI0HaXEcItzhHwznmn4JL0/ta85SZIkSdJ00hFoJmZWWVK3hZkdYmZvt0CeA9qzNyDW/p8Yxx9rDCxinn0lXVDOP0mSJOkY1DVHQNLRwDAzmx3nqwIHm9kfW9G2ZAkgHJsWdW4WR4nhlBFOkqSjUm+PwBEVJwB8+RtwRKtY1MEJlb87JD0e6n+VWf6zJJ0TqnxjJG0c4V+U9JhcEfBeSWtGeDdJf474kyUdGOF7SXokFP6uL6zRL9owQtLZUc6TkvpHeFWlwbj2o0L46YXwUyOPh3ARomr3vIOkB+UqinfFSoRynKGSLgnRnycl7RPhA7Sg6mIl/hGS/iGpq6ooDTbpS0mSJEmaTb2OQJfK0jXwBgdYtnVM6vB8DnjRzLaNjYnuLFybE0qBF+Gz/AEewtfdbwdcC5wU4T+rxDezbYD7Y1nhT3F1vu3xN+kf1LBj6VAaPB74eYRVVRqUtBcuNLQjrmq4g6TdJe0AfDXC9o40CxDL+S4EDgoVxSuAX9awqVeU8QXgErkC40JIOgZf9bB/pFlIabBG/kmSJEkLU+/ywTuB6yRVJsR9hwUbwM7EFOB3ks7Gd/QbVbh2TeFvZQXBOnjd9cSdp4oa4UC8EQY+FhnaB5fpHR1+17LUXvpXVAHsFce1lAb3is/ECO8W4SsBN1fmN0gaXqWczYCtgHvCpi7ASzVs+puZfQT8U9JTuAJjmW8AzwH7m9kHkupSGpR0JKHe2KV7WdE5SZIkaS71OgIn4z/C343ze1hQY77TYGZPStoef4M+U9J9ZnZG5XIxavy9EDjXzIbLt/Qd0kD2Au4paPU3RDUVwFpKg58Ffm1mfyqFH19HOQKmmdnOdcStpopYZgreA7EO7hTVpTSYEsNJkiStQ12OQLzlXYJ3964GrBM69p0OSWsBr5nZ1ZJmA98uXB4EnBV/K2/yKwMvxPFhhbj34EI6x0e+qwKP4mI9G5vZv+QqgWub2ZN1mldRGrw/3rY3jbLvAn4haVhoH6yNC/mMBIZK+jX+LHyRhZdBzgR6SNrZzB6JoYJNzWxalfK/LOlKYANgw0i7UynORHxTo+HhoNwH3CrpPDN7JZ6vlczsmVo3mRLDSZIkLUe9qwZGAPtG/PHAK5IeNrMTWtG2jsrW+O5/H+GN6XcL11aVK+69B1Te6ocA10t6HbgfbyQBzsQb/an4W/3pZnaTpMHANZKWi3g/Bep1BKoqDZrZ3fJd/x6J7ve5wNfNbIKk64DH8e74hRQSzez9GGq4QNLK+DPwe1w/ocyzwBigO3CUmb1bmFpSzPMh+TLCO4DPxD1WUxpMkiRJWpm6lAUlTTSz7SR9G1jXzH4uaXJMckvwVQNAXzN7tb1taQ8kDcXnTNzQ2mWlsmCSJEnT0SIqCy4dk92+Aiy0FCxJkiRJksWTeicLnoGPM482s7GSNgT+2XpmLX6YWa/2tqE9MbPB7W1DkiRJ0nTq6hEws+vNbBsz+26cP2VmB7auaZ0XuTjRGnE8t4XyPEPSwHL+LUVr5JkkSZK0PvVOFtwUn+m9ppltJWkbYF8zO7NVrUtaDDM7rb1taCkWB4nhlBROkmRxod45ApcBP8ZndGNmkymI4XRGJPWSNEPSZZKmSbpbUte4tpGkO0OSd5Sk3hHeQ9KNIfU7VtKuEb56pJ8m6XJ8bX21MqvKBJfizJV0XuR1n6QeEb7QJkIh7/uPkPtdUdIVIfU7UdJ+VfIeIGmkXGJ5plxSeKFnSNItce/TQgioEv45uXTy45Lui7BGy02SJElaj3odgRXMbEwp7MOWNmYxZBPgD2a2JTAbqAyXXIoL++wAnAhUNmc6HzgvJIAPZL4o08+BhyKfm4H1ygWphkxwFZtWBMZFXg8yX364TDfgNuAaM7sMOBW4P2SL98CXSK5YJd2OwLG4AuJGwJeqxDk87r0vcFw4Oj1wh/JAM9sW+HLErbfcJEmSpBWod7Lgq5I2IpTi4s2ylsxsZ+JpM5sUx+OBXvJNgnbBtQMq8SqaAAOBLQrh3SP+7kSDamZ3hOZAmVoywSNL8T4Crovjq5kvRVzmVuAcMxtWyH9fzd8meHncIZlRSjfGzJ4CkHQNsBtQXjJ4nKQD4njdsLMHMNLMno77fK0p5SolhpMkSVqFeh2Bo/G33N6SXsClYXNjmPkyv+CiQF3xXpbZsYFOmaXwDYjeLQZWE92pgqgiE1wHtYQiRgOfk/RXczEJ4W/rM5uY3wLnchnlgcDOZvZ2iFFV3XyokqSeclNiOEmSpHVo0BGQVNz57u/AA3hj9hbetX1u65m2eGJmb0h6WtKXzez6UPjbxsweB+7Gu9V/AyCpT/QojAS+hu9d8Hlg1SpZV5UJNrPyBj1LAQfhOx1+Dd/9sBqnxecPwPci/2MlHWtmJmk7M5tYJd2OkjbAlf8GEY1zgZWB18MJ6M18ieFHgT9K2sDMnpa0WvQK1Fvux6TEcJIkScvR2ByBleLTF5fSXRVYBTgK2L5VLVu8OQT4lqTHcSneygS444C+MdlvOl6PAKcDu0uahg8RPFvO0MzuBv6KywRPwbvjV6pS9lt4Yz0V2BPXgKjF94Guks4BfgEsA0wOO35RI81YfJvlGXjP0M2l63fiAlQz8H0XHg37/4t37d8U9VIZvqi33CRJkqQVqFdieCTwBTN7M85XAu4ws2qT1ZJ2RNJcM+vWSnkPAE40s31aI/96SYnhJEmSprOoEsNrAu8Xzt+PsCRJkiRJFmPqnSz4F2CMpEo38P7A0NYwKFk0Wqs3IPIeAYxorfyTJEmStqdeieFfAt8EXo/PN83s161pWGcmhHt2KZwvJAZUI12DcsQhgjS1JWysBxVkjZMkSZKOSb09ApjZBGBCK9qSzGcAMBd4uJ3tWCRaS9Y4JYaTJElajnrnCCR1EHK5d4SE7lRJgyL80yGfOyXkdJeL8OLmQn0ljZDUC19NcIKkSZL6R/a7S3pY0lON9Q5I6hbywhOizGpywRuGTf1UQxK5FH9IQfSHuL9ealhq+eOejEbq4PSCrQuVnSRJkrQe6Qi0LJ8DXjSzbc1sK+BOScvj8ykGmdnWeC/Md2tlYGazgEtwKeI+ZjYqLvXEVfz2wZflNcS7wAFmtj0u2/s7FVSLJG0G3AgMNrOx1JZErpdaUsuV8hqrg1fD1ouj/IWQdKSkcZLGzXt7ThPNS5IkSWqRjkDLMgX4jKSzJfU3sznAZrgU8ZMR50pcUrip3GJmH5nZdBpfsSHgV5ImA/cCaxfS9MDlhQ8xs8dLksiTgD/hTkdTWEhquXS9sTqoyCBXSwu4sqCZ9TWzvl1WWLmJ5iVJkiS1qHuOQNI4ZvakpO2BvXGVwPvwRrcWHzLfGWtIhhcWlDNuTJP4ELzB38HMPpA0q5D/HFywaDdgOg1LIteytWxvNanlplBJP498JpMkSdqU/NFtQSStBbxmZldLmg18GzgH34xoYzP7F3AovisgwCxgB+AfLNid/ibQfRFMWRl4JZyAPYD1C9feBw4A7grxob82IIlcZBY+LEE4Oxs0wZ6Z1K6DJpMSw0mSJC1HDg20LFvjeguT8O1/z4wNhr6Jd71PwXcHvCTinw6cL2kc/jZc4TbggNJkwaYwDJcyngJ8A3iieNHM3sIb9RMk7UttSeQiNwKrhQzwMcCTVeJUpZE6SJIkSdqRuiSGk6QjkRLDSZIkTWdRJYaTJEmSJFkCSUcgSZIkSTox6QgsIZRliVupjFaTDJbUR9LerZF3kiRJUptcNbDkMIBWliWuJRksqYuZzat2rR4kLQ30AfoCf28sfkeUGE5J4SRJFleyR6AdaECKeAdJD4bU712Sekb4EZLGRvwbJa1Qyq8XJVliSV+U9FjI+t4rac2I203Sn0POd7KkA0vmIem0KG+qpEsrqoQlyeBZIZw0AfiyXB75/Ch/qqQdI95qkm6Jsh6VtE2ED5F0laTRwFXAGcCgSD+oVSo+SZIkWYh0BNqHalLEywAXAgeF1O8VwC8j/k1m1s/MtgVmAN8qZlZDlvghYCcz2w64Fjgpov8MmGNmW5vZNsD9Vey7KMrbChcH2qfGffzPzLY3s2vjfIUQJvpe2A++RHJilPUTfEvrClsAA83sYOA04Lqw/7pyQSkxnCRJ0jrk0ED7MAXX/z8buN3MRknaCtgKuCdewLsAL0X8rSSdCawCdAPuqqOMdYDroldhWeDpCB8IfLUSycxer5J2D0knASsAq+HaArdViVdusK+JPEdK6i5pFVzB8MAIv1/S6pIqYknDzeydOu4FM7sU3xOB5XpukmtekyRJWoh0BNqBGlLENwPTzGznKkmGAvvH3gCD8fkAjXEhcK6ZDZc0ABhSj22xQdAfgb5m9pykIdSWP36rdF5uoBtrsMvpkyRJkjYmHYF2oIYU8VlAD0k7m9kjMVSwqZlNA1YCXoqwQ4AXqmRbliVeuRDvsEL4PcDRwPFhy6qlXoFKo/9qbEh0EHBDnbc2CHhA0m748MMcSaPC5l+EQ/Kqmb0hLbRdwptxn42SEsNJkiQtR84RaB+qSRG/jze6Z4fU7yR8V0Dwcf3HgNGU5IILlGWJh+CSvuOBVwvxzgRWjQl9j+PbFH+Mmc0GLgOm4kMQY5twX+9KmojPV6jMYxgC7CDfCfEsFnRKijwAbJGTBZMkSdqWlBhOWgRJI4ATzazVtX9TYjhJkqTppMRwkiRJkiQLkXMEkhbBzAa0tw1JkiRJ01kiHYGY6T7XzH7bCnn3AnYxs7+2dN6tRdlmSX2Atczs73E+GF8lcEyd+TUpfqT5iZn9qo54syLvV2vFSWXBJEmSliOHBppOL+Br7W1EE+nFgjb3wZcutiU/aePykiRJkjpYYhwBSadKelLSQ8BmhfA+IW07WdLNklaV9ImYTY+kbSWZpPXi/N+SVgg53QskPSzpqYq0Lj7zvX/Mbj9B0vIFyd6JkvaIfO4oyOlOlHRaHJ8hlwweELK8N0h6QtKwipRv6b6OkzQ97L82wlaUdIWkMZH3fhHeS9IoSRPis0sVm0+mATlfST3kMsZj47NrjSpfS9Kdkv4p6ZxC+oOjLqaGYBKSzgK6RnnDIuzrYf8kSX+S1KXOrzpJkiRpQZaIoQFJO+BqeX3we5oAjI/LfwGONbMHJZ0B/NzMjo8GvDvQHxiHN5QPAa+Y2dvRJvfElfF6A8Px9fSn4LPj94myfwiYmW0tqTdwt6RNgVGR5zPAh0ClQe2P7wvQE9gO2BJ4EV8auCsuDVzkFGADM3tPrtQHcCpwv5kdHmFjJN0LvAJ8xszelbQJrvTXt4rNL1Po2o+u/grn41LFD4VzdBeweZVq7xP2vwfMlHQhMA84G9gBeD3qYn8zO0XSMSE/jKTNcc2BXc3sA0l/xLUG/rJwMY6kI4EjAbp071ErWpIkSdJElghHAG9cbzaztwEkDY+/KwOrmNmDEe9K4Po4fhhveHcHfoXr/wtvwCvcYmYfAdMVm/ZUYTdcxQ8zeyIa/oojcBwu7XsH8Bn5ZkEbmNlMufTvGDN7PmydhHfhlx2BycAwSbcAt0TYXsC+kk6M8+WB9XCH4qKYAzAv7GgqA/H1/JXz7pK6mdncUrz7zGxO2D4dWB9YHRhhZv+N8GF4/d5SSvtp3FkYG+V0xZ2YmqTEcJIkSeuwpDgCzWEk7kCsD9wKnIxL4hZnob1XOF6o274RxuJv40/han5rAEcwv6einP88qn8fX8Ab0y8Cp0raOmw50MxmFiPGJMmXgW3xYZ93m2gzkW4nM2ssbT2210LAlWb246YalyRJkrQsS4ojMBIYKunX+D19EfhTSNy+Lql/7Mh3KFDpHRiF7+430sw+kvQaPoGuscapLIVbkdC9P4YE1gNmmtn7kp4DvoyPyfcAfhufupC0FLCumT0QwxZfZf6mQ8dKOtbMTNJ2ZjYRlxV+Pu7nMHzjomo2NyTnezdwLPCbsKGPmU2q0+QxwAWS1sCHBg4mekuADyQtY2YfAPcBt0o6z8xekbQasJKZPVNPISkxnCRJ0nIsEZMFzWwCvhPe48A/WFAW9zDgN3KJ2z54o1zZule4EwHeJT+7xm58RSYD8yQ9LukEfIOepSRNCRsGm1nlbXkUPufgnThehwWHHhqjC3B15D0RuCAkgH8BLANMljQtzglbDpNLB/dm/qY+ZZsbkvM9DugbkxOn4/MZ6sLMXsLnIzyAfxfjzezWuHxp2DvMzKYDP8XnEEzGe0x61ltOkiRJ0nKkxHCy2JESw0mSJE1HKTGcJEmSJEmZdASSJEmSpBOzpEwWbFPkkr23m9lW7W1LW6FWlG1uKh1FYjhlhZMkWRLIHoEkSZIk6cSkI9B8uki6TNI0SXdL6grVJY0jfISk8ySNkzRDUj9JN4VE75mVTOuR3pV0Wsj/TpV0qZx6ZJO/KOkxuSzxvZLWlLRU2NAj4i4l6V+V8xJbxH08Jem4gj0/CFumSjo+wnrJpZOHyqWfh0kaKGl0lLdjxKsql5wkSZK0DekINJ9NgD+Y2ZbAbODACP8LcLKZbQNMAX5eSPN+zNi8BBcxOhrYChgsafWS9G4fXKjnkCplX2Rm/WJooiuwj5m9AlSTTV6fkE3Gl0juZGbbAdcCJ4Vy4tWFcgYCj1fUAUv0Bj4L7Aj8XNIycnnnbwKfBHYCjpC0XcTfGPhdpOuNb3y0G3Ai8zchqsgl7wjsgS/1XLFcsKQjw4kaN+/tOVVMS5IkSZpDzhFoPk8XhHbGA73UsKQx+H4F4A7CtFh3j6SngHXxRrIe6d09JJ0ErACsBkwDbqNx2eR1gOvk8sbL4vLHAFfgjsnvgcOBP9e45ztCI+E9Sa8Aa4bNN5vZW3EvN+GOyPCooykRPg2XJbbQRegVedaSS55RLDglhpMkSVqHdASaT1lit2sT0nxUSv8R/l00Kr0raXlcOKivmT0Xk/iWj8uNySZfCJxrZsMlDQCGAEQ+L0vaE3/br9YLUbQf6pMVLt9j8f4raavKJSdJkiRtQzoCLUgjksb1UI/0bqXRf1VSN+AgfFdEaFw2eWXghTg+rFT25fgQwVVmNq8JNo/C5Z3Pwhv1A/D7rpdacsk1SYnhJEmSliPnCLQ8VSWN66Ee6d2QGL4MmIo3omML12bRsGzyEOD6mFT4aqn44fg+BrWGBWrZPAEYiu8z8BhweWMNeYlacslJkiRJG5ASwwkAkvoC55lZ//a2pTFSYjhJkqTp1JIYzqGBBEmnAN+l9tyAJEmSZAklhwYSzOwsM1vfzB5qb1uSJEmStiV7BDo4kuaaWTdJa+HbEB9UT/y2yFPS/sCTMbehWcSQxDfM7DhJg/HVEMc0lKa9JIZTUjhJkiWR7BFYTDCzFxtrsNshz/2BLRqLJKmmw2lm48zsuFrXkyRJktYlHYHFhJDsnRrHg0Oe+M6Q6z2nSvw1JD0iqeZrbCnPFST9TdJ0uTTyY/G2Xon7S0mPy+WT15S0C7AvvkJikqSNSnkPlXSJpMeAcyTtGPZMlPSwpM0i3gBJt7dIJSVJkiRNJocGFl/6ANvhIj0zJV1oZs8BSFoTXw74UzO7p878vge8bmZbSNoKmFS4tiLwqJmdGk7HEWZ2pqTh+C6MN1TJD1zJcBczm1eRPjazDyUNxJUPD6yRbiEkHQkcCdCle7VtEJIkSZLmkI7A4st9ZjYHQNJ0XE3wOXxN/n3A0QWp43rYDTgfwMymho5BhfeBylv7eOAzdeZ5fUGcaGXgSkmb4GqHyzTBtpQYTpIkaSVyaGDxpZbc74d4Y/3ZFizrA5svOFGPtHCFtwrHvwAeiI2Svsh8hcQkSZKkHckegSUPwzcOul7SyWZ2dp3pRgNfAR6QtAWwdR1p3gRWqjP/orzx4DrTVCUlhpMkSVqO7BFYAonu+IOBPSV9r85kfwR6xDDDmfiOho3t93st8KOYALhRI3HPAX4taSLpgCZJknQYUmI4AUBSF2AZM3s3GvV7gc3M7P12Nm0hUmI4SZKk6aTEcNIYK+DDAsvgGxd9ryM6AUmSJEnLko5AAoCZvQks5CkmSZIkSzbpCDQRSQ+b2S5NTLM/iy7F+3fga7ENcVPT9gHWMrO/N7f8jkRKDCdJkrQcOVmwiTTVCQj2pw4p3kbK3bs5TkDQB9h7UcpPkiRJlkzSEWgikuaWZXElXRQb5iDprJDpnSzpt3VK8V4c0r1PRd5XSJohaWgh3qyQDe4V1y6TNE3S3ZK6RpwRFVngiDtL0rLAGcCgKH+QpBWjjDEx43+/SLNlhE0K+zepcv8XSxoXZZ8eYf0k3RTH+0l6R9KykpaX9FSEHyFpbMgU3xiSxitJejrmJSCpe/E8SZIkaX3SEWhBJK0OHABsaWbbAGea2cO43O+PzKyPmf27StJVgZ2BEyLuecCWwNbRrV9mE+APZrYlMJsGpHpjwt9pwHVR/nXAqcD9ZrYjsAfupKwIHAWcb2Z98PkCz1fJ8tSYdboN8ClJ2wAT8V4HgP7AVKAf8EngsQi/ycz6mdm2wAzgWzEvYQRQ6XP/asT7oFyopCPDARk37+3GVjUmSZIk9ZKOQMsyB3gX+D9JXwLerjPdbaHcNwV42cymmNlH+Fr+XlXiP21mk+J4fI04DbEXcIqkSXhDvDywHvAI8BNJJwPrm9k7VdJ+RdIEvPHfEtjCzD4E/i1pc2BH4Fxgd9wpGBXptpI0StIU4JBIC3A58M04/ibw52oGm9mlZtbXzPp2WWHlJt5ukiRJUoucLNg8PmRBJ2p5gNhQZ0fg08BBwDHAnnXkV5EL/ogFpYM/ovp3VJYX7lrFroYkfAUcaGYzS+EzYrfALwB/l/QdM7v/40TSBsCJQD8zez2GLirljAQ+D3yAaxAMBboAP4rrQ4H9zezxGEYZAGBmo2O4YwDQxcymNmA3kMqCSZIkLUn2CDSPZ4AtJC0naRW84UdSN2DlmJ1/ArBtxG+KFO+iMAvYIY4PKoSXy78LOFaSACRtF383BJ4yswuAW/Hu/yLd8f0D5sQOh58vXBsFHA88Ymb/BVYHNsOHCYjyX4rx/0NK+f4F+Cs1egOSJEmS1iMdgaZjsd3v3/BG7m94Nzl4Y3d77Nz3EPCDCG+KFO+i8FvguyHju0Yh/AHccZkkaRC+AdAywGRJ0+IcfK+BqTFksBXeQH+MmT2O3+sTeMM9unD5MWBNvGcAYDIwpbBZ0c8izuhIX2QYPk/immbcc5IkSbIIpMRwE4jJgBPMbP32tmVJQtJBwH5mdmg98VNiOEmSpOmkxPAiImktfGLdb9vZlCUKSRfiQwypc5AkSdIOLNZDAzHJrNHJZc3Mu4+kjxsnM3vRzDY1swurxP14/X4r2DEgtAgWG8o2S9pfvrVx5Xxo9AJgZsea2cZm9mR72JokSdLZyR6BKkhaGl8X3xdob1neAcBc4OF2tqMpDGBBm/cHbgeaLbFcJCWGkyRJWo7Fukcg6FJDZW8jSXdKGh/r13tH+BclPRYT9+6N2e9IGiLpKkmjgasoqfEVC5TUVdK1ofB3M/OX7yHpYElTJE2VdHaEfVnSuXH8/YLa3oZRXkU58HRJEyJ9b0m9cJGfE8KO/tELcr9c+e8+SetJ6hKKfJK0iqR5knaPfEdK2iTu74rovXhK0nHliox8hobtUySd0NS6rGLzp2hYWXEHSQ9G3ndJ6tn8RyFJkiRpKktCj8AmwMFmdoSkv+Eqe1cDlwJHmdk/JX0S+CO+pv8hYCczM0nfBk4Cfhh5bQHsZmbvyNe69zWzY6qU+V3gbTPbXK6sNwE+nkdwNr6E73XgbvmGQ6OiHHCRnf9JWjuORxbyfdXMtpf0PeBEM/u2pEuAuWb22yjjNuBKM7tS0uHABWa2v6SZYf8GYU9/uSbAulEHAL1xJcGVgJmSLi6p+PUB1jazraKsVSK87ro0sx9WsXk4cLuZ3RDnxN9lgAvxiYL/DYfrl8DhVeo8SZIkaQWWBEdgIZU9+Xr+XYDrK40OsFz8XQe4Lt48lwWeLuQ1vIaaXpndgQsAzGyyfLkguKzuiFhHj6RhwO5mdoukbpJWAtbFl95VlPduKuRbOR4PfKlG2TsXrl0FnBPHoyLPDYBfA0cADwJjC2nvMLP3gPckvYIv9yvKCD8FbCifwHcH7sg0ty7rYTN8meI9kXcX4KVqESUdCRwJ0KV7jyYWkyRJktRiSRgaKKvsLY3f1+zQ1q98No84FwIXmdnWwHdYUIHvrVa082FcQncm3mj3xxv14lr8yr1U7qMpjIw8d8TnNayCj9WPKsSpVlcfY2av4yJII/Du/ctpfl3Wg4BphXy3NrO9qkVMieEkSZLWYUnoEVgIM3sjxsy/bGbXy183twlBnJWBFyLqYQ1k05Aa4Ejga8D9krZivgLfGOACSWvgQwMH440leIN8Rnwm4l3075hZYzvovIkr+lV4GN+c5ypcoa/S0I+JsKfM7F25KNB3gH0ayf9jwu73zezGGGq4upl1Wba5Vl3OBHpI2tnMHomhgk3NbFpDdqbEcJIkScuxJPQI1OIQ4FuSHsc379kvwofg3dzjgVcbSF9W4ytyMdBN0gy8YR8PYGYvAadE2seB8WZ2a6QZhQ8LjDSzecBz+Bh7Y9wGHFCZLAgcC3wzhiMOBb4fZb8XeT5aKG8lfCOjelkbGBFOxNXAjyO8qXVZtrmqsmLsjHgQcHbkPQkfhkiSJEnaiFQWTBY7UlkwSZKk6aiGsuCS3COQJEmSJEkjpCOQJEmSJJ2YdATaAEnHycWHhrVwvgMk3V7j2uUqyPrWiPOx1G9HQa0o15wkSZIszBK5aqAD8j1goJkV1+wjaWkz+7A1CjSzb7dGvh2BlBhOkiRpObJHoJUJlb0NgX9IOkElKWNJPSTdKGlsfHaNdCvKJYHHxGz7/WoU0U3SDZKekDQslvct8GYt6VuSnoy8LpN0USH97pIelssOL9Q7EHbcIelxufTwoAifJekcuRTxGEkbR3iT7kcNyDUnSZIkrU/2CLQyZnaUpM8Be5jZq5KGsKCU8V+B88zsIUnrAXcBmwOnAveb2eFyqd8xku41s7Lo0XbAlsCLuDjRrhSWJcplj38GbI+v578fX9pYoSewGy4/PBy4oZT/54AXzewLkV9RzWeOmW0t6RvA73HNgvObcj+41sFCcs1lUlkwSZKkdUhHoH0oShkPxPUKKte6h6zvXsC+kk6M8OWB9YAZpbzGVIYcYv1/LxbUJ9gReNDMXos41wObFq7fYmYfAdMVGzCVmAL8Tr6B0u1mVlQqvKbw97xm3k8tueYFMLNL8T0PWK7nJrnmNUmSpIVIR6B9KL7VL4Vv3PNuMUJ08R9oZjMbyatB2eA6KKZX+aKZPSlpe2Bv4ExJ95nZGZXLxajxt0n3U3AYkiRJknYgHYH2525cLfA3AJL6xCZKdwHHSjo2dvfbzswmNiP/scDvJa2KDw0cSBPUBmNo4TUzu1rSbKA4CXEQcFb8faSZ91NLrrkmKTGcJEnScqQj0P4cB/whusSXxhvGo4Bf4OPukyUthe/sV/e+ARXM7AVJv8L3IngNeAJobH+DIlsDv5H0EfABvgVzhVXD7vfwfRWacz8XA3+WyzXPIOSakyRJkrYhJYY7AZK6mdlcSUsDNwNXmNnNi5jnLKCvmTW0X0OrkBLDSZIkTSclhjs3Q2Ii4VT8TfyWdrUmSZIk6TDk0EAnwMxObDxWk/Ps1dJ5JkmSJG1P9gi0IZLWlfSApOmSpkn6fpU4P5Rkktaocm1wSQyoNW29RtLkEEE6Q9LACD9e0gptYUOSJEnS+mSPQNvyIfBDM5sgaSVgvKR7zGw6uKOAr7d/ti2NUknqWNL/A/qZ2cZVoh8PXA283UbmLURbSQynpHCSJJ2B7BFoQ8zsJTObEMdv4rPk1y5EOQ84iQXX55dZS9Kdkv4p6ZxKoKSDQ+53aoj/VMLnFo4PkjQ0jodKukTSY8A5LMjdwNqSJknqH3EPknQcsBbwgKQHKvlL+mVIED9aESVqQGr4U5HvpJAaXklST0kjI2yqpP5Nq9kkSZKkuaQj0E5I6oXLAz8W5/sBL5jZ4w2lA/rg6/a3BgbFcMNawNnAnnG9n6T96zBjHWAXM/tBKXxf4N9m1qeoJGhmF+BSxnuY2R4RvCLwqJltiy8VPCLCK1LD/XDtgssj/ETgaDPrA/QH3sF1BO6KsG2BSWVDJR0paZykcfPebsrqxyRJkqQhcmigHQjJ3RuB483sjRhz/wk+LNAY95nZnMhnOrA+sDowwsz+G+HDcOneWxrJ63ozm9e8u/iY94HKVsjjgc/EcS2p4dHAuWHjTWb2vKSxwBWSlsEljyeVC0mJ4SRJktYhewTamGjsbgSGmdlNEbwRsAHweKzPXweYEGP1ZZoqKVxsNJcvXStvYNQcPrD5YhRFeypSw33is7aZzTWzs3B1wq7AaEm9zWwk7ri8AAyVb2KUJEmStAHZI9CGhN7+/wEzzOzcSriZTQE+UYg3i6aJ9YwBLoiVBq/jKn8XxrWXJW0OzAQOwGWGF4U3gZWAxmyrKjUsaaO43ymS+gG9Jb0DPG9ml0laDt8p8S+1Mk6J4SRJkpYjewTall2BQ4E9CxPm9l7UTM3sJeAU4AF8i+HxZnZrXD4F77p/GHhpUcvCu+fvrEwWbIDjgL6xBHE6LjMMcHxMCJyMSxb/AxiA94ZMxOc/nN8CdiZJkiR1kBLDyWJHSgwnSZI0nVoSw+kIJIsdkt7Ehzo6EmvQ+HBJe9AR7eqINkHHtKsj2gQd066OaBN0LLvWN7Me5cCcI5Asjsys5tW2J5LGdTSboGPa1RFtgo5pV0e0CTqmXR3RJui4dhXJOQJJkiRJ0olJRyBJkiRJOjHpCCSLI5e2twFV6Ig2Qce0qyPaBB3Tro5oE3RMuzqiTdBx7fqYnCyYJEmSJJ2Y7BFIkiRJkk5MOgJJkiRJ0olJRyBZbJD0OUkzJf1L0intaMcVkl6RNLUQtpqke2J76HskrdrGNq0r6QFJ0yVNk/T9DmLX8pLGxDbV0ySdHuEbSHosvsvrJC3blnaFDV1iK+zbO5BNs+TbiU+SNC7C2vs7XEXSDZKekDRD0s4dwKbNCuqskyS9Ien4DmDXCfGcT5V0TTz/7f5cNUY6AsligaQuwB+AzwNbAAdL2qKdzBkKfK4Udgq+M+QmwH1x3pZ8CPzQzLYAdgKOjvppb7veA/aMbar7AJ+TtBO+bfZ5ZrYxvj/Gt9rYLoDvAzMK5x3BJvBtvvsU1p6393d4PnCnmfXGtwmf0d42mdnMyoZmwA7A28DN7WmXpLUJaXUz2wroAnyVjvNc1cbM8pOfDv8BdgbuKpz/GPhxO9rTC5haOJ8J9IzjnrjoUXvW1634ltAdxi5gBWAC8ElcaW3pat9tG9myDt5Q7InvxaH2tinKnQWsUQprt+8QWBl4mphY3hFsqmLjXsDo9rYLWBt4DlgNF+u7HfhsR3iuGvtkj0CyuFD5J6vwfIR1FNY03/wJ4D/Amu1liKRewHbAYx3BruiCnwS8AtwD/BuYbWYfRpT2+C5/D5wEfBTnq3cAm8C3Db9b0nhJR0ZYe36HGwD/Bf4cwyiXS1qxnW0q81XgmjhuN7vM7AXgt8Cz+AZvc4DxdIznqkHSEUiSFsbc9W+XdbmSugE3Aseb2RsdwS4zm2fehbsOsCPQu61tKCJpH+AVMxvfnnbUYDcz2x4fAjta0u7Fi+3wHS6Nbwt+sZltB7xFqbu9nZ/3ZYF9gevL19rarpiPsB/uPK0FrMjCQ4gdknQEksWFF4B1C+frRFhH4WVJPQHi7yttbYCkZXAnYJiZ3dRR7KpgZrPxrbJ3BlaRVNnrpK2/y12BfSXNAq7FhwfOb2ebgI/fKjGzV/Ax7x1p3+/weeB5M3sszm/AHYOO8lx9HphgZi/HeXvaNRB42sz+a2YfADfhz1q7P1eNkY5AsrgwFtgkZuAui3cHDm9nm4oMBw6L48PwMfo2Q5KA/wNmmNm5HciuHpJWieOu+LyFGbhDcFB72GVmPzazdcysF/4c3W9mh7SnTQCSVpS0UuUYH/ueSjt+h2b2H+A5SZtF0KeB6e1pU4mDmT8sAO1r17PATpJWiP/HSl2163NVD6ksmCw2SNobH9vtAlxhZr9sJzuuAQbg24u+DPwcuAX4G7Ae8AzwFTN7rQ1t2g0YBUxh/rj3T/B5Au1p1zbAlfh3thTwNzM7Q9KG+Nv4asBE4Otm9l5b2VWwbwBwopnt0942Rfk3x+nSwF/N7JeSVqd9v8M+wOXAssBTwDeJ77K9bAq7VsQb3w3NbE6EtXddnQ4MwlfxTAS+jc8JaPdnvSHSEUiSJEmSTkwODSRJkiRJJyYdgSRJkiTpxKQjkCRJkiSdmHQEkiRJkqQTk45AkiRJknRi0hFIkiRJkk5MOgJJkiRJ0on5/1Sr5o88aAmuAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "df1.groupby('desc').sum()['count'].sort_values().plot(kind='barh', title='the Top20 of the hotel desc After remove stopword')\n",
    "plt.show()#把排名前20的可视化一下"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 以上算是可视化的部分"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      located southern tip lake union hilton garden ...\n",
       "1      located citys vibrant core sheraton grand seat...\n",
       "2      located heart downtown seattle awardwinning cr...\n",
       "3      whats near hotel downtown seattle location bet...\n",
       "4      situated amid incredible shopping iconic attra...\n",
       "                             ...                        \n",
       "147    located queen anne district halcyon suite du j...\n",
       "148    block world famous space needle seattle center...\n",
       "149    stay alfred wall street resides heart belltown...\n",
       "150    perfect marriage heightened convenience unbeat...\n",
       "151    yes true every room citizenm best every room s...\n",
       "Name: desc_clean, Length: 152, dtype: object"
      ]
     },
     "execution_count": 56,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 文本预处理\n",
    "REPLACE_BY_SPACE_RE = re.compile('[/(){}\\[\\]\\|@,;]')\n",
    "BAD_SYMBOLS_RE = re.compile('[^0-9a-z #+_]')\n",
    "STOPWORDS = set(stopwords.words('english'))\n",
    "# 对文本进行清洗\n",
    "def clean_text(text):\n",
    "    # 全部小写\n",
    "    text = text.lower()\n",
    "    # 用空格替代一些特殊符号，如标点\n",
    "    text = REPLACE_BY_SPACE_RE.sub(' ', text)\n",
    "    # 移除BAD_SYMBOLS_RE\n",
    "    text = BAD_SYMBOLS_RE.sub('', text)\n",
    "    # 从文本中去掉停用词\n",
    "    text = ' '.join(word for word in text.split() if word not in STOPWORDS) \n",
    "    return text\n",
    "# 对desc字段进行清理\n",
    "df['desc_clean'] = df['desc'].apply(clean_text)\n",
    "df['desc_clean']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>address</th>\n",
       "      <th>desc</th>\n",
       "      <th>desc_clean</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Hilton Garden Seattle Downtown</th>\n",
       "      <td>1821 Boren Avenue, Seattle Washington 98101 USA</td>\n",
       "      <td>Located on the southern tip of Lake Union, the...</td>\n",
       "      <td>located southern tip lake union hilton garden ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Sheraton Grand Seattle</th>\n",
       "      <td>1400 6th Avenue, Seattle, Washington 98101 USA</td>\n",
       "      <td>Located in the city's vibrant core, the Sherat...</td>\n",
       "      <td>located citys vibrant core sheraton grand seat...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Crowne Plaza Seattle Downtown</th>\n",
       "      <td>1113 6th Ave, Seattle, WA 98101</td>\n",
       "      <td>Located in the heart of downtown Seattle, the ...</td>\n",
       "      <td>located heart downtown seattle awardwinning cr...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Kimpton Hotel Monaco Seattle</th>\n",
       "      <td>1101 4th Ave, Seattle, WA98101</td>\n",
       "      <td>What?s near our hotel downtown Seattle locatio...</td>\n",
       "      <td>whats near hotel downtown seattle location bet...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>The Westin Seattle</th>\n",
       "      <td>1900 5th Avenue, Seattle, Washington 98101 USA</td>\n",
       "      <td>Situated amid incredible shopping and iconic a...</td>\n",
       "      <td>situated amid incredible shopping iconic attra...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>The Halcyon Suite Du Jour</th>\n",
       "      <td>1125 9th Ave W, Seattle, WA 98119</td>\n",
       "      <td>Located in Queen Anne district, The Halcyon Su...</td>\n",
       "      <td>located queen anne district halcyon suite du j...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Vermont Inn</th>\n",
       "      <td>2721 4th Ave, Seattle, WA 98121</td>\n",
       "      <td>Just a block from the world famous Space Needl...</td>\n",
       "      <td>block world famous space needle seattle center...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Stay Alfred on Wall Street</th>\n",
       "      <td>2515 4th Ave, Seattle, WA 98121</td>\n",
       "      <td>Stay Alfred on Wall Street resides in the hear...</td>\n",
       "      <td>stay alfred wall street resides heart belltown...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Pike's Place Lux Suites by Barsala</th>\n",
       "      <td>2nd Ave and Stewart St, Seattle, WA 98101</td>\n",
       "      <td>The perfect marriage of heightened convenience...</td>\n",
       "      <td>perfect marriage heightened convenience unbeat...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>citizenM Seattle South Lake Union hotel</th>\n",
       "      <td>201 Westlake Ave N, Seattle, WA 98109</td>\n",
       "      <td>Yes, it's true. Every room at citizenM is the ...</td>\n",
       "      <td>yes true every room citizenm best every room s...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>152 rows × 3 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                                                 address  \\\n",
       "name                                                                                       \n",
       "Hilton Garden Seattle Downtown           1821 Boren Avenue, Seattle Washington 98101 USA   \n",
       "Sheraton Grand Seattle                    1400 6th Avenue, Seattle, Washington 98101 USA   \n",
       "Crowne Plaza Seattle Downtown                            1113 6th Ave, Seattle, WA 98101   \n",
       "Kimpton Hotel Monaco Seattle                              1101 4th Ave, Seattle, WA98101   \n",
       "The Westin Seattle                        1900 5th Avenue, Seattle, Washington 98101 USA   \n",
       "...                                                                                  ...   \n",
       "The Halcyon Suite Du Jour                              1125 9th Ave W, Seattle, WA 98119   \n",
       "Vermont Inn                                              2721 4th Ave, Seattle, WA 98121   \n",
       "Stay Alfred on Wall Street                               2515 4th Ave, Seattle, WA 98121   \n",
       "Pike's Place Lux Suites by Barsala             2nd Ave and Stewart St, Seattle, WA 98101   \n",
       "citizenM Seattle South Lake Union hotel            201 Westlake Ave N, Seattle, WA 98109   \n",
       "\n",
       "                                                                                      desc  \\\n",
       "name                                                                                         \n",
       "Hilton Garden Seattle Downtown           Located on the southern tip of Lake Union, the...   \n",
       "Sheraton Grand Seattle                   Located in the city's vibrant core, the Sherat...   \n",
       "Crowne Plaza Seattle Downtown            Located in the heart of downtown Seattle, the ...   \n",
       "Kimpton Hotel Monaco Seattle             What?s near our hotel downtown Seattle locatio...   \n",
       "The Westin Seattle                       Situated amid incredible shopping and iconic a...   \n",
       "...                                                                                    ...   \n",
       "The Halcyon Suite Du Jour                Located in Queen Anne district, The Halcyon Su...   \n",
       "Vermont Inn                              Just a block from the world famous Space Needl...   \n",
       "Stay Alfred on Wall Street               Stay Alfred on Wall Street resides in the hear...   \n",
       "Pike's Place Lux Suites by Barsala       The perfect marriage of heightened convenience...   \n",
       "citizenM Seattle South Lake Union hotel  Yes, it's true. Every room at citizenM is the ...   \n",
       "\n",
       "                                                                                desc_clean  \n",
       "name                                                                                        \n",
       "Hilton Garden Seattle Downtown           located southern tip lake union hilton garden ...  \n",
       "Sheraton Grand Seattle                   located citys vibrant core sheraton grand seat...  \n",
       "Crowne Plaza Seattle Downtown            located heart downtown seattle awardwinning cr...  \n",
       "Kimpton Hotel Monaco Seattle             whats near hotel downtown seattle location bet...  \n",
       "The Westin Seattle                       situated amid incredible shopping iconic attra...  \n",
       "...                                                                                    ...  \n",
       "The Halcyon Suite Du Jour                located queen anne district halcyon suite du j...  \n",
       "Vermont Inn                              block world famous space needle seattle center...  \n",
       "Stay Alfred on Wall Street               stay alfred wall street resides heart belltown...  \n",
       "Pike's Place Lux Suites by Barsala       perfect marriage heightened convenience unbeat...  \n",
       "citizenM Seattle South Lake Union hotel  yes true every room citizenm best every room s...  \n",
       "\n",
       "[152 rows x 3 columns]"
      ]
     },
     "execution_count": 57,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 建模\n",
    "df.set_index('name', inplace = True)#把name那一列当索引\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### TF-IDF， 按照(min_df, max_df)提取关键词， 并生成TFIDF矩阵"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "TFIDF 特征名字: ['000', '000 sq', '000 sq ft', '000 square', '000 square feet', '10', '100', '100 nonsmoking', '10minute', '11', '12', '120', '14', '15', '15 miles', '15 miles hotel', '15 minutes', '16', '17', '1900', '1900s', '1926', '1928', '1962', '20', '20 minute', '20 minute walk', '20 minutes', '200', '2015', '2017', '2018', '21', '24', '24 hours', '24 hours day', '24hour', '24hour business', '24hour business center', '24hour desk', '24hour desk property', '24hour fitness', '24hour fitness center', '24hour shuttle', '24hour shuttle airport', '24hour shuttle service', '26', '29', '29 km', '30', '30 minutes', '31', '32inch', '32inch flatscreen', '37inch', '40', '405', '42', '42 flat', '4th', '500', '500 companies', '50inch', '50inch hdtv', '55', '5th', '5th avenue', '60', '70', '76', '99', 'aaa', 'aaa diamond', 'access', 'access 32inch', 'access 32inch flatscreen', 'access best', 'access best seattle', 'access business', 'access business center', 'access city', 'access i5', 'access light', 'access light rail', 'access local', 'access major', 'access major freeways', 'access microwaves', 'access microwaves refrigerators', 'access popular', 'access popular attractions', 'access seasonal', 'access seattle', 'accessible', 'acclaimed', 'accommodate', 'accommodate private', 'accommodate private business', 'accommodating', 'accommodation'] \n",
      "一共的长度: 3154\n"
     ]
    }
   ],
   "source": [
    "# 使用TF-IDF提取文本特征\n",
    "tf = TfidfVectorizer(analyzer='word', ngram_range=(1, 3), min_df=0.01, stop_words='english')\n",
    "tfidf_matrix = tf.fit_transform(df['desc_clean'])\n",
    "print('TFIDF 特征名字:',tf.get_feature_names()[0:100],'\\n一共的长度:',len(tf.get_feature_names()))#输出前100个"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 73,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'scipy.sparse.csr.csr_matrix'> (152, 3154) \n",
      "   (0, 376)\t0.0988145082229999\n",
      "  (0, 2334)\t0.10493558569292807\n",
      "  (0, 1208)\t0.09018733407575145\n",
      "  (0, 42)\t0.08690743468812835\n",
      "  (0, 2)\t0.10493558569292807\n",
      "  (0, 2179)\t0.10493558569292807\n",
      "  (0, 1777)\t0.0940666316769628\n",
      "  (0, 1170)\t0.10493558569292807\n",
      "  (0, 591)\t0.10493558569292807\n",
      "  (0, 2420)\t0.0988145082229999\n",
      "  (0, 1471)\t0.0988145082229999\n",
      "  (0, 2592)\t0.0988145082229999\n",
      "  (0, 1671)\t0.0988145082229999\n",
      "  (0, 375)\t0.0988145082229999\n",
      "  (0, 169)\t0.10493558569292807\n",
      "  (0, 256)\t0.0988145082229999\n",
      "  (0, 2784)\t0.09018733407575145\n",
      "  (0, 2332)\t0.07729044349327371\n",
      "  (0, 1204)\t0.05524527771087949\n",
      "  (0, 1867)\t0.10493558569292807\n",
      "  (0, 2819)\t0.10493558569292807\n",
      "  (0, 1026)\t0.05594295429536109\n",
      "  (0, 41)\t0.0840662566058233\n",
      "  (0, 2116)\t0.10493558569292807\n",
      "  (0, 404)\t0.05818473416407789\n",
      "  :\t:\n",
      "  (151, 2313)\t0.10938920138371673\n",
      "  (151, 552)\t0.06208992001743357\n",
      "  (151, 1108)\t0.0817261625451344\n",
      "  (151, 238)\t0.05578546497360744\n",
      "  (151, 551)\t0.10633117218815535\n",
      "  (151, 3010)\t0.0911256945313327\n",
      "  (151, 592)\t0.11650263971349294\n",
      "  (151, 1167)\t0.23300527942698587\n",
      "  (151, 313)\t0.11875056417384842\n",
      "  (151, 724)\t0.07612840002428264\n",
      "  (151, 564)\t0.08020533779422331\n",
      "  (151, 298)\t0.08020533779422331\n",
      "  (151, 2581)\t0.10633117218815535\n",
      "  (151, 3130)\t0.06069589886713962\n",
      "  (151, 2636)\t0.08894293360397607\n",
      "  (151, 1765)\t0.07053063750343089\n",
      "  (151, 1916)\t0.0817261625451344\n",
      "  (151, 1612)\t0.05812072439840433\n",
      "  (151, 2591)\t0.09351663860846192\n",
      "  (151, 2318)\t0.04330766028296035\n",
      "  (151, 2624)\t0.04231073735612507\n",
      "  (151, 1578)\t0.06208992001743357\n",
      "  (151, 1439)\t0.0613832274944592\n",
      "  (151, 1307)\t0.18208769660141885\n",
      "  (151, 1334)\t0.03359231306092802\n"
     ]
    }
   ],
   "source": [
    "print(type(tfidf_matrix),tfidf_matrix.shape,'\\n',tfidf_matrix)#tfidf_matrix此时也是个稀疏矩阵 矩阵有点大"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step2， 计算酒店间的相似度矩阵"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[1.        , 0.0391713 , 0.10519839, ..., 0.04506191, 0.01188579,\n",
       "        0.02732358],\n",
       "       [0.0391713 , 1.        , 0.06121291, ..., 0.06131857, 0.01508036,\n",
       "        0.03706011],\n",
       "       [0.10519839, 0.06121291, 1.        , ..., 0.09179925, 0.04235642,\n",
       "        0.05607314],\n",
       "       ...,\n",
       "       [0.04506191, 0.06131857, 0.09179925, ..., 1.        , 0.0579826 ,\n",
       "        0.04145794],\n",
       "       [0.01188579, 0.01508036, 0.04235642, ..., 0.0579826 , 1.        ,\n",
       "        0.0172546 ],\n",
       "       [0.02732358, 0.03706011, 0.05607314, ..., 0.04145794, 0.0172546 ,\n",
       "        1.        ]])"
      ]
     },
     "execution_count": 74,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 计算酒店之间的余弦相似度（线性核函数）\n",
    "cosine_similarities = linear_kernel(tfidf_matrix, tfidf_matrix)\n",
    "cosine_similarities"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0               Hilton Garden Seattle Downtown\n",
       "1                       Sheraton Grand Seattle\n",
       "2                Crowne Plaza Seattle Downtown\n",
       "3                Kimpton Hotel Monaco Seattle \n",
       "4                           The Westin Seattle\n",
       "                        ...                   \n",
       "147                  The Halcyon Suite Du Jour\n",
       "148                                Vermont Inn\n",
       "149                 Stay Alfred on Wall Street\n",
       "150         Pike's Place Lux Suites by Barsala\n",
       "151    citizenM Seattle South Lake Union hotel\n",
       "Name: name, Length: 152, dtype: object"
      ]
     },
     "execution_count": 76,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "indices = pd.Series(df.index) #df.index是酒店名称\n",
    "indices"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step3， 对于指定的酒店， 选择相似度最大的Top-K个酒店迕行输出"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "idx= 49\n",
      "['Embassy Suites by Hilton Seattle Tacoma International Airport', 'DoubleTree by Hilton Hotel Seattle Airport', 'Seattle Airport Marriott', 'Motel 6 Seattle Sea-Tac Airport South', 'Knights Inn Tukwila', 'Four Points by Sheraton Downtown Seattle Center', 'Radisson Hotel Seattle Airport', 'Hampton Inn Seattle/Southcenter', 'Home2 Suites by Hilton Seattle Airport', 'Red Lion Hotel Seattle Airport Sea-Tac']\n"
     ]
    }
   ],
   "source": [
    "# 基于相似度矩阵和指定的酒店name，推荐TOP10酒店\n",
    "def recommendations(name, cosine_similarities = cosine_similarities):\n",
    "    recommended_hotels = []\n",
    "    # 找到想要查询酒店名称的idx\n",
    "    idx = indices[indices == name].index[0]\n",
    "    print('idx=', idx)\n",
    "    # 对于idx酒店的余弦相似度向量按照从大到小进行排序\n",
    "    score_series = pd.Series(cosine_similarities[idx]).sort_values(ascending = False)\n",
    "    # 取相似度最大的前10个（除了自己以外）\n",
    "    top_10_indexes = list(score_series.iloc[1:11].index)\n",
    "    # 放到推荐列表中\n",
    "    for i in top_10_indexes:\n",
    "        recommended_hotels.append(list(df.index)[i])\n",
    "    return recommended_hotels\n",
    "print(recommendations('Hilton Seattle Airport & Conference Center'))#输入一个酒店 然后得到了酒店的索引和最相似的10个酒店"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
